EyeLevel RAG MCP Server
A local Retrieval-Augmented Generation (RAG) system implemented as an MCP (Model Context Protocol) server. This server allows you to ingest markdown files into a local knowledge base and perform semantic search to retrieve relevant context for LLM queries.
Features
- Local RAG Implementation: No external dependencies or paid services required
- Markdown File Support: Ingest and search through
.mdfiles - Semantic Search: Uses sentence transformers for embedding-based similarity search
- Persistent Storage: Automatically saves and loads the vector index using FAISS
- Chunk Management: Intelligently splits documents into searchable chunks
- Multiple Documents: Support for ingesting and searching across multiple markdown files
Installation
- Clone this repository
- Install dependencies using uv:
uv sync
Dependencies
sentence-transformers: For creating text embeddingsfaiss-cpu: For efficient vector similarity searchnumpy: For numerical operationsmcp[cli]: For the MCP server framework
Available Tools
1. search_doc_for_rag_context(query: str)
Searches the knowledge base for relevant context based on a user query.
Parameters:
query(str): The search query
Returns:
- Relevant text chunks with relevance scores
2. ingest_markdown_file(local_file_path: str)
Ingests a markdown file into the knowledge base.
Parameters:
local_file_path(str): Path to the markdown file to ingest
Returns:
- Status message indicating success or failure
3. list_indexed_documents()
Lists all documents currently in the knowledge base.
Returns:
- Summary of indexed files and chunk counts
4. clear_knowledge_base()
Clears all documents from the knowledge base.
Returns:
- Confirmation message
Usage
-
Start the server:
python main.py -
Ingest markdown files: Use the
ingest_markdown_filetool to add your.mdfiles to the knowledge base. -
Search for context: Use the
search_doc_for_rag_contexttool to find relevant information for your queries.
How It Works
- Document Processing: Markdown files are split into chunks based on paragraphs and sentence boundaries
- Embedding Creation: Text chunks are converted to embeddings using the
all-MiniLM-L6-v2model - Vector Storage: Embeddings are stored in a FAISS index for fast similarity search
- Retrieval: User queries are embedded and matched against the stored vectors to find relevant content
File Structure
main.py: Main server implementation with RAG functionalitypyproject.toml: Project dependencies and configurationrag_index.faiss: FAISS vector index (created automatically)rag_documents.pkl: Serialized documents and metadata (created automatically)
Configuration
The RAG system uses the all-MiniLM-L6-v2 sentence transformer model by default. This model provides a good balance between speed and quality for semantic search tasks.
Example Workflow
- Prepare your markdown files with the content you want to search
- Use
ingest_markdown_fileto add each file to the knowledge base - Use
search_doc_for_rag_contextto find relevant context for your questions - The retrieved context can be used by an LLM to provide informed answers
Notes
- The first time you run the server, it will download the sentence transformer model
- The vector index is automatically saved and loaded between sessions
- Long documents are automatically chunked to optimize search performance
- The system supports multiple markdown files and maintains source file metadata
