EyeLevel RAG MCP Server

A local Retrieval-Augmented Generation (RAG) system implemented as an MCP (Model Context Protocol) server. This server allows you to ingest markdown files into a local knowledge base and perform semantic search to retrieve relevant context for LLM queries.

Features

Local RAG Implementation: No external dependencies or paid services required
Markdown File Support: Ingest and search through .md files
Semantic Search: Uses sentence transformers for embedding-based similarity search
Persistent Storage: Automatically saves and loads the vector index using FAISS
Chunk Management: Intelligently splits documents into searchable chunks
Multiple Documents: Support for ingesting and searching across multiple markdown files

Installation

Clone this repository
Install dependencies using uv:
```
uv sync
```

Dependencies

sentence-transformers: For creating text embeddings
faiss-cpu: For efficient vector similarity search
numpy: For numerical operations
mcp[cli]: For the MCP server framework

Available Tools

1. `search_doc_for_rag_context(query: str)`

Searches the knowledge base for relevant context based on a user query.

Parameters:

query (str): The search query

Returns:

Relevant text chunks with relevance scores

2. `ingest_markdown_file(local_file_path: str)`

Ingests a markdown file into the knowledge base.

Parameters:

local_file_path (str): Path to the markdown file to ingest

Returns:

Status message indicating success or failure

3. `list_indexed_documents()`

Lists all documents currently in the knowledge base.

Returns:

Summary of indexed files and chunk counts

4. `clear_knowledge_base()`

Clears all documents from the knowledge base.

Returns:

Confirmation message

Usage

Start the server:
```
python main.py
```
Ingest markdown files: Use the ingest_markdown_file tool to add your .md files to the knowledge base.
Search for context: Use the search_doc_for_rag_context tool to find relevant information for your queries.

How It Works

Document Processing: Markdown files are split into chunks based on paragraphs and sentence boundaries
Embedding Creation: Text chunks are converted to embeddings using the all-MiniLM-L6-v2 model
Vector Storage: Embeddings are stored in a FAISS index for fast similarity search
Retrieval: User queries are embedded and matched against the stored vectors to find relevant content

File Structure

main.py: Main server implementation with RAG functionality
pyproject.toml: Project dependencies and configuration
rag_index.faiss: FAISS vector index (created automatically)
rag_documents.pkl: Serialized documents and metadata (created automatically)

Configuration

The RAG system uses the all-MiniLM-L6-v2 sentence transformer model by default. This model provides a good balance between speed and quality for semantic search tasks.

Example Workflow

Prepare your markdown files with the content you want to search
Use ingest_markdown_file to add each file to the knowledge base
Use search_doc_for_rag_context to find relevant context for your questions
The retrieved context can be used by an LLM to provide informed answers

Notes

The first time you run the server, it will download the sentence transformer model
The vector index is automatically saved and loaded between sessions
Long documents are automatically chunked to optimize search performance
The system supports multiple markdown files and maintains source file metadata

EyeLevel RAG MCP Server

Features

Local RAG Implementation: No external dependencies or paid services required
Markdown File Support: Ingest and search through .md files
Semantic Search: Uses sentence transformers for embedding-based similarity search
Persistent Storage: Automatically saves and loads the vector index using FAISS
Chunk Management: Intelligently splits documents into searchable chunks
Multiple Documents: Support for ingesting and searching across multiple markdown files

Installation

Clone this repository
Install dependencies using uv:
```
uv sync
```

Dependencies

sentence-transformers: For creating text embeddings
faiss-cpu: For efficient vector similarity search
numpy: For numerical operations
mcp[cli]: For the MCP server framework

Available Tools

1. `search_doc_for_rag_context(query: str)`

Searches the knowledge base for relevant context based on a user query.

Parameters:

query (str): The search query

Returns:

Relevant text chunks with relevance scores

2. `ingest_markdown_file(local_file_path: str)`

Ingests a markdown file into the knowledge base.

Parameters:

local_file_path (str): Path to the markdown file to ingest

Returns:

Status message indicating success or failure

3. `list_indexed_documents()`

Lists all documents currently in the knowledge base.

Returns:

Summary of indexed files and chunk counts

4. `clear_knowledge_base()`

Clears all documents from the knowledge base.

Returns:

Confirmation message

Usage

Start the server:
```
python main.py
```
Ingest markdown files: Use the ingest_markdown_file tool to add your .md files to the knowledge base.
Search for context: Use the search_doc_for_rag_context tool to find relevant information for your queries.

How It Works

Document Processing: Markdown files are split into chunks based on paragraphs and sentence boundaries
Embedding Creation: Text chunks are converted to embeddings using the all-MiniLM-L6-v2 model
Vector Storage: Embeddings are stored in a FAISS index for fast similarity search
Retrieval: User queries are embedded and matched against the stored vectors to find relevant content

File Structure

main.py: Main server implementation with RAG functionality
pyproject.toml: Project dependencies and configuration
rag_index.faiss: FAISS vector index (created automatically)
rag_documents.pkl: Serialized documents and metadata (created automatically)

Configuration

The RAG system uses the all-MiniLM-L6-v2 sentence transformer model by default. This model provides a good balance between speed and quality for semantic search tasks.

Example Workflow

Prepare your markdown files with the content you want to search
Use ingest_markdown_file to add each file to the knowledge base
Use search_doc_for_rag_context to find relevant context for your questions
The retrieved context can be used by an LLM to provide informed answers

Notes

The first time you run the server, it will download the sentence transformer model
The vector index is automatically saved and loaded between sessions
Long documents are automatically chunked to optimize search performance
The system supports multiple markdown files and maintains source file metadata

EyeLevel RAG MCP Server

EyeLevel RAG MCP Server

Features

Installation

Dependencies

Available Tools

1. search_doc_for_rag_context(query: str)

2. ingest_markdown_file(local_file_path: str)

3. list_indexed_documents()

4. clear_knowledge_base()

Usage

How It Works

File Structure

Configuration

Example Workflow

Notes

EyeLevel RAG MCP Server

Features

Installation

Dependencies

Available Tools

1. search_doc_for_rag_context(query: str)

2. ingest_markdown_file(local_file_path: str)

3. list_indexed_documents()

4. clear_knowledge_base()

Usage

How It Works

File Structure

Configuration

Example Workflow

Notes

1. `search_doc_for_rag_context(query: str)`

2. `ingest_markdown_file(local_file_path: str)`

3. `list_indexed_documents()`

4. `clear_knowledge_base()`

1. `search_doc_for_rag_context(query: str)`

2. `ingest_markdown_file(local_file_path: str)`

3. `list_indexed_documents()`

4. `clear_knowledge_base()`