Claude RAG Pipeline with MCP Integration

Retrieval Augmented Generation (RAG) system featuring conversational memory, hybrid knowledge modes, and Model Context Protocol (MCP) integration for document search functionality within Claude Desktop.

Features

Complete RAG Pipeline: Document processing, vector embeddings, semantic search, and LLM-powered responses
Conversational Memory: Full ChatGPT-style conversations with context preservation across exchanges
Hybrid Knowledge Mode: Ability to switch between document-based responses and general knowledge
Semantic Chunking: Intelligent document segmentation that preserves meaning and context
MCP Integration: Native Claude Desktop integration for document access functionality
Multi-format Support: PDF, Word, Markdown, and plain text documents
Vector Database: ChromaDB for efficient semantic search
Web Interface: Streamlit app for document management and chat

Architecture

Documents → Semantic Processing → Vector Embeddings → ChromaDB → Retrieval → Claude API → Response
                                                                        ↓
                                                                 MCP Protocol
                                                                        ↓
                                                                Claude Desktop

Tech Stack

LLM: Claude 3.5 (Anthropic API)
Embeddings: OpenAI text-embedding-ada-002 or local sentence-transformers
Vector Database: ChromaDB
Web Framework: Streamlit
Document Processing: PyPDF2, python-docx
Integration Protocol: Model Context Protocol (MCP)

Quick Start

Prerequisites

Python 3.8+
OpenAI API key
Anthropic API key
Claude Desktop app (for MCP integration)

Installation

Clone the repository:

git clone https://github.com/yourusername/claude-rag-mcp-pipeline.git
cd claude-rag-mcp-pipeline

Create virtual environment:

python3 -m venv rag_env
source rag_env/bin/activate

Install dependencies:

pip install -r requirements.txt

Configure environment variables:

cp .env.example .env
# Edit .env with your API keys

Basic Usage

Create documents folder:

mkdir documents

Add documents to the documents/ folder (PDF, Word, Markdown, or text files)
Start the Streamlit app:

streamlit run app.py

Ingest documents using the sidebar interface
Chat with your documents using the conversational interface

MCP Integration (Advanced)

Connect your RAG system to Claude Desktop for native document access:

Configure Claude Desktop (create config file if it doesn't exist):

# Create the config file if it doesn't exist
mkdir -p "~/Library/Application Support/Claude"
touch "~/Library/Application Support/Claude/claude_desktop_config.json"

Edit the configuration file:

// ~/.../Claude/claude_desktop_config.json
{
  "mcpServers": {
    "personal-documents": {
      "command": "/path/to/your/project/rag_env/bin/python",
      "args": ["/path/to/your/project/mcp_server.py"],
      "env": {
        "OPENAI_API_KEY": "your_key",
        "ANTHROPIC_API_KEY": "your_key"
      }
    }
  }
}

Start the MCP server:

python mcp_server.py

Use Claude Desktop - Chats will access your documents when relevant (e.g. try prompting "Can you search my documents for details regarding ...?"). Ensure "personal-documents" is enabled under "Search and tools".

Project Structure

claude-rag-mcp-pipeline/
├── src/
│   ├── document_processor.py    # Document processing and semantic chunking
│   ├── embeddings.py           # Embedding generation (OpenAI/local)
│   ├── vector_store.py         # ChromaDB interface
│   ├── llm_service.py          # Claude API integration
│   └── rag_system.py           # Main RAG orchestration
├── documents/                  # Your documents go here
├── chroma_db/                 # Vector database (auto-created)
├── app.py                     # Streamlit web interface
├── mcp_server.py              # MCP protocol server
├── requirements.txt           # Python dependencies
├── .env.example              # Environment variables template
└── README.md

Key Components

Document Processing

Multi-format support: Handles PDF, Word, Markdown, and text files
Semantic chunking: Preserves document structure and meaning
Metadata preservation: Tracks source files and chunk locations

Vector Search

Semantic similarity: Finds relevant content by meaning, not keywords
Configurable retrieval: Adjustable number of context chunks
Source attribution: Clear tracking of information sources

Conversational Interface

Memory persistence: Maintains conversation context across exchanges
Hybrid responses: Combines document knowledge with general AI capabilities
Source citation: References specific documents in responses

MCP Integration

Native Claude access: Use Claude Desktop with your document knowledge
Automatic tool detection: Claude recognizes when to search your documents
Secure local processing: Documents never leave your machine

Configuration

Embedding Options

Switch between OpenAI embeddings and local models:

# In src/rag_system.py
rag = ConversationalRAGSystem(embedding_provider="openai")  # or "local"

Chunking Parameters

Adjust semantic chunking behavior:

# In src/document_processor.py
chunks = self.semantic_chunk_llm(text, max_chunk_size=800, min_chunk_size=100)

Response Tuning

Modify retrieval and response generation:

# Number of chunks to retrieve
result = rag.query(question, n_results=5)

# Claude response length
response = llm_service.generate_response(query, chunks, max_tokens=600)

Claude Model Selection

Change the Claude model version in src/llm_service.py:

 # In LLMService class methods, update the model parameter (check console.anthropic.com for currently available models):
model="claude-3-5-haiku-latest"     # Fast, cost-effective  
model="claude-3-5-sonnet-latest"    # Higher quality reasoning
model="claude-3-opus-latest"        # Most capable
model="claude-4-sonnet-latest"      # If available

Use Cases

Personal Knowledge Base: Make your documents searchable and conversational
Research Assistant: Query across multiple documents simultaneously
Document Analysis: Extract insights from large document collections
Enterprise RAG: Foundation for company-wide knowledge systems

Technical Details

Transformer Architecture Understanding

This system demonstrates practical implementation of:

Vector embeddings for semantic representation
Attention mechanisms through retrieval scoring
Multi-step reasoning through conversation context
Hybrid AI architectures combining retrieval and generation

API Costs

Typical monthly usage:

OpenAI embeddings: $2-10
Claude API calls: $5-25
Total: $7-35/month for moderate usage

Cost reduction:

Use local embeddings (sentence-transformers) for free embedding generation
Adjust response length limits
Optimize chunk retrieval counts

Production Considerations

Current Implementation Scope

This system is designed for personal/single-user environments. However, the core RAG functionality, MCP integration, and conversational AI systems can be implemented at enterprise-level.

Enterprise Production Deployment Requirements

To deploy this system in a true production enterprise environment, the following additions would be needed:

Authentication & Authorization:

Multi-user authentication system (SSO integration)
Role-based access controls (RBAC)
Document-level permissions and access policies
API key rotation and secure credential management

Infrastructure & Scalability:

Container orchestration (Kubernetes deployment)
Production-grade vector database (Pinecone, Weaviate, or managed ChromaDB)
Load balancing and horizontal scaling
Database clustering and replication
CDN integration for document serving

Monitoring & Operations:

Application performance monitoring (APM)
Logging aggregation and analysis
Health checks and alerting systems
Usage analytics and cost tracking
Backup and disaster recovery procedures

Security Hardening:

Input validation and sanitization
Rate limiting and DDoS protection
Network security (VPC, firewalls, encryption in transit)
Data encryption at rest
Security audit trails and compliance logging

Enterprise Integration:

Integration with existing identity providers
Corporate data governance policies
Compliance with data retention requirements
Integration with enterprise monitoring/alerting systems
Multi-tenancy support with resource isolation

Cost Management:

Usage-based billing and chargeback systems
Cost optimization and budget controls
API usage monitoring and alerts
Resource utilization optimization

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests for new functionality
Submit a pull request

License

MIT License - see LICENSE file for details.

Acknowledgments

Built with:

Anthropic Claude for LLM capabilities
OpenAI for embedding models
ChromaDB for vector storage
Streamlit for web interface
Model Context Protocol for Claude Desktop integration

Claude RAG Pipeline with MCP Integration

Features

Complete RAG Pipeline: Document processing, vector embeddings, semantic search, and LLM-powered responses
Conversational Memory: Full ChatGPT-style conversations with context preservation across exchanges
Hybrid Knowledge Mode: Ability to switch between document-based responses and general knowledge
Semantic Chunking: Intelligent document segmentation that preserves meaning and context
MCP Integration: Native Claude Desktop integration for document access functionality
Multi-format Support: PDF, Word, Markdown, and plain text documents
Vector Database: ChromaDB for efficient semantic search
Web Interface: Streamlit app for document management and chat

Architecture

Documents → Semantic Processing → Vector Embeddings → ChromaDB → Retrieval → Claude API → Response
                                                                        ↓
                                                                 MCP Protocol
                                                                        ↓
                                                                Claude Desktop

Tech Stack

LLM: Claude 3.5 (Anthropic API)
Embeddings: OpenAI text-embedding-ada-002 or local sentence-transformers
Vector Database: ChromaDB
Web Framework: Streamlit
Document Processing: PyPDF2, python-docx
Integration Protocol: Model Context Protocol (MCP)

Quick Start

Prerequisites

Python 3.8+
OpenAI API key
Anthropic API key
Claude Desktop app (for MCP integration)

Installation

Clone the repository:

git clone https://github.com/yourusername/claude-rag-mcp-pipeline.git
cd claude-rag-mcp-pipeline

Create virtual environment:

python3 -m venv rag_env
source rag_env/bin/activate

Install dependencies:

pip install -r requirements.txt

Configure environment variables:

cp .env.example .env
# Edit .env with your API keys

Basic Usage

Create documents folder:

mkdir documents

Add documents to the documents/ folder (PDF, Word, Markdown, or text files)
Start the Streamlit app:

streamlit run app.py

Ingest documents using the sidebar interface
Chat with your documents using the conversational interface

MCP Integration (Advanced)

Connect your RAG system to Claude Desktop for native document access:

Configure Claude Desktop (create config file if it doesn't exist):

# Create the config file if it doesn't exist
mkdir -p "~/Library/Application Support/Claude"
touch "~/Library/Application Support/Claude/claude_desktop_config.json"

Edit the configuration file:

// ~/.../Claude/claude_desktop_config.json
{
  "mcpServers": {
    "personal-documents": {
      "command": "/path/to/your/project/rag_env/bin/python",
      "args": ["/path/to/your/project/mcp_server.py"],
      "env": {
        "OPENAI_API_KEY": "your_key",
        "ANTHROPIC_API_KEY": "your_key"
      }
    }
  }
}

Start the MCP server:

python mcp_server.py

Use Claude Desktop - Chats will access your documents when relevant (e.g. try prompting "Can you search my documents for details regarding ...?"). Ensure "personal-documents" is enabled under "Search and tools".

Project Structure

claude-rag-mcp-pipeline/
├── src/
│   ├── document_processor.py    # Document processing and semantic chunking
│   ├── embeddings.py           # Embedding generation (OpenAI/local)
│   ├── vector_store.py         # ChromaDB interface
│   ├── llm_service.py          # Claude API integration
│   └── rag_system.py           # Main RAG orchestration
├── documents/                  # Your documents go here
├── chroma_db/                 # Vector database (auto-created)
├── app.py                     # Streamlit web interface
├── mcp_server.py              # MCP protocol server
├── requirements.txt           # Python dependencies
├── .env.example              # Environment variables template
└── README.md

Key Components

Document Processing

Multi-format support: Handles PDF, Word, Markdown, and text files
Semantic chunking: Preserves document structure and meaning
Metadata preservation: Tracks source files and chunk locations

Vector Search

Semantic similarity: Finds relevant content by meaning, not keywords
Configurable retrieval: Adjustable number of context chunks
Source attribution: Clear tracking of information sources

Conversational Interface

Memory persistence: Maintains conversation context across exchanges
Hybrid responses: Combines document knowledge with general AI capabilities
Source citation: References specific documents in responses

MCP Integration

Native Claude access: Use Claude Desktop with your document knowledge
Automatic tool detection: Claude recognizes when to search your documents
Secure local processing: Documents never leave your machine

Configuration

Embedding Options

Switch between OpenAI embeddings and local models:

# In src/rag_system.py
rag = ConversationalRAGSystem(embedding_provider="openai")  # or "local"

Chunking Parameters

Adjust semantic chunking behavior:

# In src/document_processor.py
chunks = self.semantic_chunk_llm(text, max_chunk_size=800, min_chunk_size=100)

Response Tuning

Modify retrieval and response generation:

# Number of chunks to retrieve
result = rag.query(question, n_results=5)

# Claude response length
response = llm_service.generate_response(query, chunks, max_tokens=600)

Claude Model Selection

Change the Claude model version in src/llm_service.py:

 # In LLMService class methods, update the model parameter (check console.anthropic.com for currently available models):
model="claude-3-5-haiku-latest"     # Fast, cost-effective  
model="claude-3-5-sonnet-latest"    # Higher quality reasoning
model="claude-3-opus-latest"        # Most capable
model="claude-4-sonnet-latest"      # If available

Use Cases

Personal Knowledge Base: Make your documents searchable and conversational
Research Assistant: Query across multiple documents simultaneously
Document Analysis: Extract insights from large document collections
Enterprise RAG: Foundation for company-wide knowledge systems

Technical Details

Transformer Architecture Understanding

This system demonstrates practical implementation of:

Vector embeddings for semantic representation
Attention mechanisms through retrieval scoring
Multi-step reasoning through conversation context
Hybrid AI architectures combining retrieval and generation

API Costs

Typical monthly usage:

OpenAI embeddings: $2-10
Claude API calls: $5-25
Total: $7-35/month for moderate usage

Cost reduction:

Use local embeddings (sentence-transformers) for free embedding generation
Adjust response length limits
Optimize chunk retrieval counts

Production Considerations

Current Implementation Scope

This system is designed for personal/single-user environments. However, the core RAG functionality, MCP integration, and conversational AI systems can be implemented at enterprise-level.

Enterprise Production Deployment Requirements

To deploy this system in a true production enterprise environment, the following additions would be needed:

Authentication & Authorization:

Multi-user authentication system (SSO integration)
Role-based access controls (RBAC)
Document-level permissions and access policies
API key rotation and secure credential management

Infrastructure & Scalability:

Container orchestration (Kubernetes deployment)
Production-grade vector database (Pinecone, Weaviate, or managed ChromaDB)
Load balancing and horizontal scaling
Database clustering and replication
CDN integration for document serving

Monitoring & Operations:

Application performance monitoring (APM)
Logging aggregation and analysis
Health checks and alerting systems
Usage analytics and cost tracking
Backup and disaster recovery procedures

Security Hardening:

Input validation and sanitization
Rate limiting and DDoS protection
Network security (VPC, firewalls, encryption in transit)
Data encryption at rest
Security audit trails and compliance logging

Enterprise Integration:

Integration with existing identity providers
Corporate data governance policies
Compliance with data retention requirements
Integration with enterprise monitoring/alerting systems
Multi-tenancy support with resource isolation

Cost Management:

Usage-based billing and chargeback systems
Cost optimization and budget controls
API usage monitoring and alerts
Resource utilization optimization

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests for new functionality
Submit a pull request

License

MIT License - see LICENSE file for details.

Acknowledgments

Built with:

Anthropic Claude for LLM capabilities
OpenAI for embedding models
ChromaDB for vector storage
Streamlit for web interface
Model Context Protocol for Claude Desktop integration

Claude RAG MCP Pipeline

Claude RAG Pipeline with MCP Integration

Features

Architecture

Tech Stack

Quick Start

Prerequisites

Installation

Basic Usage

MCP Integration (Advanced)

Project Structure

Key Components

Document Processing

Vector Search

Conversational Interface

MCP Integration

Configuration

Embedding Options

Chunking Parameters

Response Tuning

Claude Model Selection

Use Cases

Technical Details

Transformer Architecture Understanding

API Costs

Production Considerations

Current Implementation Scope

Enterprise Production Deployment Requirements

Contributing

License

Acknowledgments

Claude RAG Pipeline with MCP Integration

Features

Architecture

Tech Stack

Quick Start

Prerequisites

Installation

Basic Usage

MCP Integration (Advanced)

Project Structure

Key Components

Document Processing

Vector Search

Conversational Interface

MCP Integration

Configuration

Embedding Options

Chunking Parameters

Response Tuning

Claude Model Selection

Use Cases

Technical Details

Transformer Architecture Understanding

API Costs

Production Considerations

Current Implementation Scope

Enterprise Production Deployment Requirements

Contributing

License

Acknowledgments