RooCode-RAG-Lookup
RooCode MCP Server for performing RAG (Retrieval-Augmented Generation) lookups in documents and code repositories using vector embeddings and semantic search.
Example Usage
Ask a question: e.g. "What is the maximum number of entries* in a word document?" and prompt the LLM stating "use rag". The LLM is usally a decent judge of when it should use a tool or not and may decide to use the tool on its own.
*This is related to the maximum number of XML properties and elements addressable in Word
Features
- Full RAG Implementation: Complete vector-based semantic search using ChromaDB and Haystack
- Document Indexing: Automatic text extraction and chunking from PDF documents
- Vector Embeddings: Sentence transformer embeddings for semantic similarity
- RAG Lookup Tool: Search through documents and code repositories with relevance scoring
- Test Tool: Simple hello world tool to verify MCP server connectivity
- Async MCP Protocol: Full JSON-RPC 2.0 support via stdio
Installation
- Install Python dependencies:
pip install -r requirements.txt
- Configure RooCode to use this MCP server by adding the configuration from
mcp_config.jsonto your RooCode settings.
Configuration
-
Add the
mcp_config.jsonto your RooCode MCP server settings in the edit global settings part of MCP tools. If the tool is ready to use it will show a green status. -
Set the following environment variables:
RAG_LOOKUP_PATH: Path to this project directoryPYTHON_PATH: Path to your Python executable
-
Configure parameters in
parameters.py:EMBEDDING_MODEL: Sentence transformer model (default: all-mpnet-base-v2)COLLECTION_NAME: ChromaDB collection nameCHUNK_SIZE: Text chunk size in words (default: 500)CHUNK_OVERLAP: Overlap between chunks (default: 50)DEFAULT_TOP_K: Number of results to return (default: 5)
Available Tools
1. rag_lookup
Perform semantic search using RAG in documents and code repositories. Returns relevant chunks with similarity scores and metadata.
Parameters:
query(required): The search querysource(optional): Where to search - "documents", "repos", or "both" (default: "both")
Returns:
- Relevant text chunks with similarity scores
- Source file information and metadata
- Statistics on documents searched
Example:
{
"query": "authentication implementation",
"source": "both"
}
Response Format:
{
"status": "success",
"query": "authentication implementation",
"results": [
{
"content": "...",
"score": 0.85,
"metadata": {
"file_name": "document.txt",
"source_file": "/path/to/document.txt"
}
}
],
"metadata": {
"documents_searched": 5,
"repos_searched": 3,
"total_matches": 5
}
}
2. say_hello
Simple test tool that returns a greeting message with timestamp.
Parameters:
name(optional): Name to include in greeting (default: "World")
Example:
{
"name": "RooCode"
}
Usage
1. Extract and Index Documents
Place PDF documents in the Documents/ or Repos/ folders, then run:
# Extract text from PDFs
python extraction/parse_pdf.py
# Populate the vector database
python extraction/populate_database.py
2. Query the RAG System
# Test RAG lookup directly
python query_rag.py
Or ask
3. Use via MCP Server
Once configured in RooCode, use the rag_lookup tool through the MCP interface. There is an MCP menu in RooCode settings editing the global settings will give you json settings to edit {"mcpServers":{}}, copy and paste the mcp_config.json into the global MCP settings.
Testing
Test the MCP server locally:
# Using MCP inspector
npx @modelcontextprotocol/inspector python mcp_tool.py
# Direct stdio test
echo '{"jsonrpc":"2.0","id":1,"method":"tools/list"}' | python mcp_tool.py
Project Structure
RooCode-RAG-Lookup/
├── mcp_tool.py # Main MCP server implementation
├── query_rag.py # RAG query functions
├── parameters.py # Configuration parameters
├── run_rag_lookup.bat # Windows batch launcher
├── mcp_config.json # Example RooCode configuration
├── requirements.txt # Python dependencies
├── extraction/
│ ├── parse_pdf.py # PDF text extraction
│ └── populate_database.py # Database population and indexing
├── ExtractedText/ # Extracted text files (.txt + .meta.json)
├── chroma_db/ # ChromaDB vector database
└── README.md # This file
Technology Stack
- MCP Python SDK: Protocol implementation for RooCode integration
- Haystack: Document processing and RAG pipeline framework
- ChromaDB: Vector database for embeddings storage
- Sentence Transformers: Semantic embeddings (all-mpnet-base-v2)
- PDFPlumber: PDF text extraction with layout preservation
- Async/Await: Concurrent request handling
- JSON-RPC 2.0: Communication protocol
- Stdio Transport: RooCode integration
How It Works
- Document Extraction: PDFs are parsed using
parse_pdf.pywhich extracts text and metadata - Text Chunking: Documents are split into overlapping chunks using
DocumentSplitter - Embedding Generation: Text chunks are converted to 768-dimensional vectors using sentence transformers
- Vector Storage: Embeddings are stored in ChromaDB with metadata for retrieval
- Semantic Search: Queries are embedded and matched against stored vectors using cosine similarity
- Result Ranking: Top-K most relevant chunks are returned with scores and metadata
Requirements
See requirements.txt for full dependencies. Key packages:
mcp>=1.0.0- MCP protocol supporthaystack-ai- RAG frameworkchroma-haystack- ChromaDB integrationsentence-transformers- Embedding modelspdfplumber- PDF extraction
License
MIT
