📚 Marcus Local MCP Server

A Model Context Protocol (MCP) server that indexes documentation sites and local code repositories for semantic search by AI assistants.

🎯 What Is This?

This is a local MCP server that enables AI assistants (Cursor, Claude Desktop, ChatGPT) to semantically search through:

Documentation websites - Crawled and indexed from any docs site
Local code repositories - All text files from your projects

It uses OpenAI embeddings to create a vector database (ChromaDB) that AI assistants can query through the Model Context Protocol.

Think of it as: Giving your AI assistant instant access to searchable documentation and your entire codebase.

📋 How It Works

┌─────────────────┐
│  AI Assistant   │ (Cursor, Claude, ChatGPT, etc.)
│  (via MCP)      │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│   MCP Server    │ (Python - stdio)
│   main.py       │
└────────┬────────┘
         │
         ▼
┌─────────────────┐      ┌──────────────┐
│   ChromaDB      │◄─────┤   OpenAI     │
│  (Vector Store) │      │  Embeddings  │
└────────┬────────┘      └──────────────┘
         │
         ▼
┌──────────────────────┐
│   Indexed Sources    │
│  • Documentation     │
│    - Moca Network    │
│    - Your Docs       │
│  • Repositories      │
│    - Your Codebase   │
│    - Local Projects  │
└──────────────────────┘

The Flow:

Index - Crawl docs OR read local repo files
Chunk - Split content into 800-token chunks
Embed - Create OpenAI embeddings (batched for speed)
Store - Save in ChromaDB vector database
Search - AI assistant queries via MCP protocol
Retrieve - Return relevant chunks from docs/code

🚀 How to Run It

1. Setup

# Clone repository
git clone <your-repo>
cd crawl4ai_test

# Install Node.js dependencies
npm install

# Setup Python virtual environment
python3 -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Install Python dependencies
pip install -r mcp-docs-server/requirements.txt

# Install Crawl4AI
pip install -U crawl4ai
crawl4ai-setup

2. Configure

Create .env file in mcp-docs-server/:

OPENAI_API_KEY=your_openai_api_key_here
EMBEDDING_MODEL=text-embedding-3-small
DEFAULT_RESULTS=5

3. Run the Web UI

# Start Next.js server
npm run dev

# Open browser
open http://localhost:3030

4. Connect to Cursor/Claude

Add to your AI assistant config:

For Cursor (~/.cursor/mcp.json or project config):

{
  "mcpServers": {
    "marcus-mcp-server": {
      "command": "/path/to/your/venv/bin/python3",
      "args": ["/path/to/crawl4ai_test/mcp-docs-server/server/main.py"]
    }
  }
}

For Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "marcus-docs": {
      "command": "/path/to/your/venv/bin/python",
      "args": ["/path/to/crawl4ai_test/mcp-docs-server/server/main.py"]
    }
  }
}

📖 How to Use It

Adding Documentation

Via Web UI:

Go to http://localhost:3030
Click "Add New Docs"
Enter:
- URL: https://docs.example.com
- Source Name: Example Docs
- Max Pages: 50 (or unlimited)
Click "Start Indexing"
Wait for completion

Via Command Line:

cd mcp-docs-server
source ../venv/bin/activate

python scripts/crawler.py https://docs.example.com "Example Docs" 50
python scripts/indexer_multi.py "Example Docs"

Adding Repositories

Via Web UI:

Go to http://localhost:3030
Click "Add Repository"
Enter:
- Repository Path: Drag-and-drop folder OR paste path
- Source Name: Auto-generated from folder name
Click "Start Indexing"
Watch live progress

What gets indexed:

✅ All text files (.js, .py, .md, .tsx, .json, .css, etc.)
✅ Auto-skips: node_modules, .git, venv, build, .next, etc.
✅ Batched embeddings (50-100x faster)

Via Command Line:

cd mcp-docs-server
source ../venv/bin/activate

python scripts/repo_indexer.py "/path/to/your/repo" "My Project"

Searching

From Web UI:

Enter query: "How do I initialize the SDK?"
Select source (Docs, Repos, or All)
Click "Search Documentation"
View results

From AI Assistant:

Search all sources:

@marcus-mcp-server search for "authentication flow"

Filter by specific source:

@marcus-mcp-server search for "BorrowInterface component" 
with source="Credo Protocol"

Example usage in Cursor:

User: Using my marcus-mcp-server, show me how authentication 
      is implemented in the Credo Protocol repository

AI: [Searches indexed repository and returns relevant code chunks]

Pro Tip: Always filter by source name to get focused results and save context tokens.

Managing Sources

View Sources:

See all indexed docs and repos on the main page
Filter by "Docs" or "Repos" tabs
Expand to see individual pages/files

Delete Sources:

Click trash icon next to any source
Confirm deletion
Source and all chunks are removed

📁 Project Structure

crawl4ai_test/
├── pages/                        # Next.js UI
│   ├── index.js                 # Main page (search + sources)
│   ├── add.js                   # Add documentation
│   ├── add-repo.js              # Add repository
│   └── api/                     # API routes
│       ├── mcp-search.js        # Search endpoint
│       ├── mcp-info.js          # Get index info
│       ├── add-docs-crawl.js    # Crawl docs
│       ├── add-docs-index.js    # Index docs
│       ├── add-repo-index.js    # Index repository
│       └── mcp-delete-source.js # Delete source
├── components/
│   ├── ui/                      # shadcn/ui components
│   └── home/                    # Page components
├── mcp-docs-server/             # MCP Server
│   ├── server/
│   │   └── main.py             # MCP server (stdio)
│   ├── scripts/
│   │   ├── crawler.py          # Crawl docs with Crawl4AI
│   │   ├── indexer_multi.py    # Index docs
│   │   ├── repo_indexer.py     # Index repositories
│   │   ├── get_source_pages.py # Get pages/files
│   │   ├── search.py           # Search
│   │   └── delete_source.py    # Delete sources
│   ├── data/
│   │   ├── chroma_db/          # Vector database
│   │   ├── chunks/             # Metadata
│   │   └── raw/                # Crawled JSON
│   └── requirements.txt
└── venv/                        # Python environment

🎨 Built With

Frontend: Next.js 15 + shadcn/ui + Tailwind CSS
Backend: Python 3.13 + MCP Protocol
Crawler: Crawl4AI
Vector DB: ChromaDB
Embeddings: OpenAI (text-embedding-3-small)

Status: ✅ Fully Operational | 🤖 MCP Ready | 🔍 Search Enabled

📚 Marcus Local MCP Server

A Model Context Protocol (MCP) server that indexes documentation sites and local code repositories for semantic search by AI assistants.

🎯 What Is This?

This is a local MCP server that enables AI assistants (Cursor, Claude Desktop, ChatGPT) to semantically search through:

Documentation websites - Crawled and indexed from any docs site
Local code repositories - All text files from your projects

It uses OpenAI embeddings to create a vector database (ChromaDB) that AI assistants can query through the Model Context Protocol.

Think of it as: Giving your AI assistant instant access to searchable documentation and your entire codebase.

📋 How It Works

┌─────────────────┐
│  AI Assistant   │ (Cursor, Claude, ChatGPT, etc.)
│  (via MCP)      │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│   MCP Server    │ (Python - stdio)
│   main.py       │
└────────┬────────┘
         │
         ▼
┌─────────────────┐      ┌──────────────┐
│   ChromaDB      │◄─────┤   OpenAI     │
│  (Vector Store) │      │  Embeddings  │
└────────┬────────┘      └──────────────┘
         │
         ▼
┌──────────────────────┐
│   Indexed Sources    │
│  • Documentation     │
│    - Moca Network    │
│    - Your Docs       │
│  • Repositories      │
│    - Your Codebase   │
│    - Local Projects  │
└──────────────────────┘

The Flow:

Index - Crawl docs OR read local repo files
Chunk - Split content into 800-token chunks
Embed - Create OpenAI embeddings (batched for speed)
Store - Save in ChromaDB vector database
Search - AI assistant queries via MCP protocol
Retrieve - Return relevant chunks from docs/code

🚀 How to Run It

1. Setup

# Clone repository
git clone <your-repo>
cd crawl4ai_test

# Install Node.js dependencies
npm install

# Setup Python virtual environment
python3 -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Install Python dependencies
pip install -r mcp-docs-server/requirements.txt

# Install Crawl4AI
pip install -U crawl4ai
crawl4ai-setup

2. Configure

Create .env file in mcp-docs-server/:

OPENAI_API_KEY=your_openai_api_key_here
EMBEDDING_MODEL=text-embedding-3-small
DEFAULT_RESULTS=5

3. Run the Web UI

# Start Next.js server
npm run dev

# Open browser
open http://localhost:3030

4. Connect to Cursor/Claude

Add to your AI assistant config:

For Cursor (~/.cursor/mcp.json or project config):

{
  "mcpServers": {
    "marcus-mcp-server": {
      "command": "/path/to/your/venv/bin/python3",
      "args": ["/path/to/crawl4ai_test/mcp-docs-server/server/main.py"]
    }
  }
}

For Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "marcus-docs": {
      "command": "/path/to/your/venv/bin/python",
      "args": ["/path/to/crawl4ai_test/mcp-docs-server/server/main.py"]
    }
  }
}

📖 How to Use It

Adding Documentation

Via Web UI:

Go to http://localhost:3030
Click "Add New Docs"
Enter:
- URL: https://docs.example.com
- Source Name: Example Docs
- Max Pages: 50 (or unlimited)
Click "Start Indexing"
Wait for completion

Via Command Line:

cd mcp-docs-server
source ../venv/bin/activate

python scripts/crawler.py https://docs.example.com "Example Docs" 50
python scripts/indexer_multi.py "Example Docs"

Adding Repositories

Via Web UI:

Go to http://localhost:3030
Click "Add Repository"
Enter:
- Repository Path: Drag-and-drop folder OR paste path
- Source Name: Auto-generated from folder name
Click "Start Indexing"
Watch live progress

What gets indexed:

✅ All text files (.js, .py, .md, .tsx, .json, .css, etc.)
✅ Auto-skips: node_modules, .git, venv, build, .next, etc.
✅ Batched embeddings (50-100x faster)

Via Command Line:

cd mcp-docs-server
source ../venv/bin/activate

python scripts/repo_indexer.py "/path/to/your/repo" "My Project"

Searching

From Web UI:

Enter query: "How do I initialize the SDK?"
Select source (Docs, Repos, or All)
Click "Search Documentation"
View results

From AI Assistant:

Search all sources:

@marcus-mcp-server search for "authentication flow"

Filter by specific source:

@marcus-mcp-server search for "BorrowInterface component" 
with source="Credo Protocol"

Example usage in Cursor:

User: Using my marcus-mcp-server, show me how authentication 
      is implemented in the Credo Protocol repository

AI: [Searches indexed repository and returns relevant code chunks]

Pro Tip: Always filter by source name to get focused results and save context tokens.

Managing Sources

View Sources:

See all indexed docs and repos on the main page
Filter by "Docs" or "Repos" tabs
Expand to see individual pages/files

Delete Sources:

Click trash icon next to any source
Confirm deletion
Source and all chunks are removed

📁 Project Structure

crawl4ai_test/
├── pages/                        # Next.js UI
│   ├── index.js                 # Main page (search + sources)
│   ├── add.js                   # Add documentation
│   ├── add-repo.js              # Add repository
│   └── api/                     # API routes
│       ├── mcp-search.js        # Search endpoint
│       ├── mcp-info.js          # Get index info
│       ├── add-docs-crawl.js    # Crawl docs
│       ├── add-docs-index.js    # Index docs
│       ├── add-repo-index.js    # Index repository
│       └── mcp-delete-source.js # Delete source
├── components/
│   ├── ui/                      # shadcn/ui components
│   └── home/                    # Page components
├── mcp-docs-server/             # MCP Server
│   ├── server/
│   │   └── main.py             # MCP server (stdio)
│   ├── scripts/
│   │   ├── crawler.py          # Crawl docs with Crawl4AI
│   │   ├── indexer_multi.py    # Index docs
│   │   ├── repo_indexer.py     # Index repositories
│   │   ├── get_source_pages.py # Get pages/files
│   │   ├── search.py           # Search
│   │   └── delete_source.py    # Delete sources
│   ├── data/
│   │   ├── chroma_db/          # Vector database
│   │   ├── chunks/             # Metadata
│   │   └── raw/                # Crawled JSON
│   └── requirements.txt
└── venv/                        # Python environment

🎨 Built With

Frontend: Next.js 15 + shadcn/ui + Tailwind CSS
Backend: Python 3.13 + MCP Protocol
Crawler: Crawl4AI
Vector DB: ChromaDB
Embeddings: OpenAI (text-embedding-3-small)

Status: ✅ Fully Operational | 🤖 MCP Ready | 🔍 Search Enabled