MCP Enhanced Data Retrieval System
An MCP (Model Context Protocol) server that standardizes AI context sharing by integrating organizational knowledge sources (GitHub, internal docs, APIs) to enable domain-aware AI assistance for enterprise development workflows.
Project Overview
This system implements the Model Context Protocol to provide:
- Standardized AI context sharing across organizational knowledge sources
- GitHub repository integration with OAuth 2.1 authentication
- Vector-based semantic search using embeddings
- Optimized 1500-token context chunking for sub-500ms TTFT
- Parallel retrieval strategy with 2-second timeout
- Streamable HTTP transport using FastAPI
Architecture
AI Applications
↓
Authentication (OAuth 2.1 + RBAC)
↓
MCP Client
↓
MCP Protocol (JSON-RPC + HTTP)
↓
MCP Server
• Multi-threaded parallel retrieval
• 1500-token chunking
↓
Knowledge Tiers (Public, Internal, Restricted)
↓
Data Sources: GitHub | Docs
Vector Storage: Embeddings
Features
- MCP Protocol Compliance: JSON-RPC 2.0 over Streamable HTTP
- GitHub Integration: Repository data retrieval and contextualization
- Vector Embeddings: Semantic search using ChromaDB and Sentence Transformers
- Context Optimization: 1500-token chunking with parallel retrieval
- OAuth 2.1 Security: Secure authentication for GitHub access
- Performance: Sub-500ms response times with 2-second retrieval timeout
Project Structure
.
├── src/
│ ├── server/ # MCP server core and FastAPI app
│ ├── auth/ # OAuth 2.1 authentication
│ ├── github/ # GitHub API integration
│ ├── vector/ # Vector database and embeddings
│ └── utils/ # Utilities and helpers
├── tests/ # Test suite
├── config/ # Configuration files
├── data/ # Data storage (vector DB, cache)
├── logs/ # Application logs
├── requirements.txt # Python dependencies
└── .env.example # Environment variables template
Setup
-
Clone and navigate to the project:
cd "MCP Enhanced Data Retrieval" -
Create virtual environment:
python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate -
Install dependencies:
pip install -r requirements.txt -
Configure environment variables:
cp .env.example .env # Edit .env with your credentials -
Run the server:
uvicorn src.server.main:app --reload
Milestone 1 Goals
- ✅ MCP protocol analysis and communication flow evaluation
- ✅ High-level architecture design for enterprise knowledge integration
- 🔄 Functional MCP server with GitHub integration
- 🔄 OAuth 2.1 authentication implementation
- 🔄 1500-token context chunking mechanism
- 🔄 Vector-based semantic search
Success Criteria
- Functional MCP server that can retrieve and contextualize GitHub repository information
- OAuth 2.1 authentication for secure GitHub access
- 1500-token context chunking maintaining sub-500ms TTFT
- Parallel retrieval with 2-second timeout
- Vector-based semantic search for relevant content
Technologies
- MCP SDK: Anthropic MCP Python SDK
- Web Framework: FastAPI with Streamable HTTP transport
- GitHub API: PyGithub
- Authentication: OAuth 2.1 (authlib)
- Vector Database: ChromaDB
- Embeddings: Sentence Transformers (all-MiniLM-L6-v2)
- Token Processing: tiktoken
Author
Kalpalathika Ramanujam Advisor: Dr. Thomas Kinsman Rochester Institute of Technology
License
Academic Project - RIT Capstone
