Research Paper Ingestion MCP Server
Autonomous knowledge acquisition from academic research papers for AGI self-improvement.
Part of the Agentic System - a 24/7 autonomous AI framework with persistent memory.
Features
Paper Discovery
- arXiv Integration: Search and download from arXiv.org
- Semantic Scholar: Citation analysis and academic impact metrics
- PDF Download: Automatic paper retrieval and storage
Knowledge Extraction
- Insight Extraction: Identify key findings and contributions
- Citation Analysis: Understand paper influence and relationships
- Technique Identification: Extract novel methods and approaches
Memory Integration
- Enhanced Memory: Store extracted knowledge for AGI learning
- Structured Entities: Create searchable memory representations
- Citation Graphs: Track knowledge lineage
Installation
cd ${AGENTIC_SYSTEM_PATH:-/opt/agentic}/agentic-system/mcp-servers/research-paper-mcp
pip install -r requirements.txt
Configuration
Add to ~/.claude.json:
{
"mcpServers": {
"research-paper-mcp": {
"command": "python3",
"args": [
"${AGENTIC_SYSTEM_PATH:-/opt/agentic}/agentic-system/mcp-servers/research-paper-mcp/server.py"
],
"env": {},
"disabled": false
}
}
}
Available Tools
search_arxiv
Search arXiv for research papers by query.
Parameters:
query(required): Search query (e.g., "recursive self-improvement AGI")max_results: Maximum results (default: 10)sort_by: Sort order - relevance, lastUpdatedDate, submittedDate
Example:
results = mcp__research-paper-mcp__search_arxiv({
"query": "meta-learning neural networks",
"max_results": 20,
"sort_by": "relevance"
})
search_semantic_scholar
Search Semantic Scholar for papers with citation metrics.
Parameters:
query(required): Search queryfields: Metadata fields to retrievelimit: Maximum results (default: 10)
Example:
results = mcp__research-paper-mcp__search_semantic_scholar({
"query": "transformer architecture attention",
"fields": ["title", "authors", "citationCount", "year"],
"limit": 15
})
download_paper
Download research paper PDF from URL.
Parameters:
url(required): PDF URLpaper_id(required): Unique identifier for filename
Example:
result = mcp__research-paper-mcp__download_paper({
"url": "https://arxiv.org/pdf/1234.5678.pdf",
"paper_id": "arxiv-1234.5678"
})
extract_insights
Extract key insights and findings from paper text.
Parameters:
paper_text(required): Full paper text or abstractfocus_areas: Optional specific areas to focus on
Example:
insights = mcp__research-paper-mcp__extract_insights({
"paper_text": paper_abstract,
"focus_areas": ["methodology", "results"]
})
analyze_citations
Analyze citation relationships and paper influence.
Parameters:
paper_id(required): Semantic Scholar or arXiv paper IDdepth: Citation graph depth 1-3 (default: 1)
Example:
analysis = mcp__research-paper-mcp__analyze_citations({
"paper_id": "arxiv:1706.03762", # "Attention Is All You Need"
"depth": 2
})
store_paper_knowledge
Store extracted knowledge in enhanced-memory for AGI learning.
Parameters:
paper_metadata(required): Paper metadata dictinsights(required): List of key insightstechniques: List of novel techniques
Example:
stored = mcp__research-paper-mcp__store_paper_knowledge({
"paper_metadata": {
"id": "arxiv-1234.5678",
"title": "Novel AGI Approach",
"authors": ["Smith", "Jones"],
"year": 2024
},
"insights": [
"Achieves 95% accuracy on benchmark",
"10x faster than previous methods"
],
"techniques": [
"Recursive meta-optimization",
"Self-modifying architectures"
]
})
Usage Patterns
Autonomous Research Workflow
# 1. Search for relevant papers
arxiv_results = mcp__research-paper-mcp__search_arxiv({
"query": "recursive self-improvement",
"max_results": 10
})
# 2. Get citation metrics
for paper in arxiv_results['papers']:
scholar_data = mcp__research-paper-mcp__search_semantic_scholar({
"query": paper['title'],
"limit": 1
})
# 3. Download high-impact papers
if scholar_data['papers'][0]['citationCount'] > 50:
pdf = mcp__research-paper-mcp__download_paper({
"url": paper['pdf_url'],
"paper_id": paper['id']
})
# 4. Extract and store insights
insights = mcp__research-paper-mcp__extract_insights({
"paper_text": paper['abstract']
})
mcp__research-paper-mcp__store_paper_knowledge({
"paper_metadata": paper,
"insights": insights['insights']
})
Citation Network Analysis
# Analyze citation influence
analysis = mcp__research-paper-mcp__analyze_citations({
"paper_id": "influential-paper-id",
"depth": 2
})
# Identify most influential papers in field
if analysis['citation_graph']['influential_citations'] > 100:
# Download and study this foundational paper
pass
Storage
- Papers Directory:
${AGENTIC_SYSTEM_PATH:-/opt/agentic}/agentic-system/research-papers/ - PDFs: Saved as
{paper_id}.pdf - Memory Integration: Via enhanced-memory-mcp create_entities
Dependencies
- arxiv: arXiv API Python wrapper
- aiohttp: Async HTTP client for Semantic Scholar API
- mcp: Model Context Protocol SDK
Future Enhancements
- PDF Text Extraction: Parse full paper text from PDFs
- Figure/Diagram Analysis: Extract visual insights
- Code Repository Links: Find implementation code
- Related Papers: Automatic discovery of connected research
- Trend Detection: Identify emerging research directions
- LLM-Powered Insight Extraction: Use GPT-4 for deeper analysis
Integration with AGI System
This MCP server closes Gap #1 from AGI_GAP_ANALYSIS.md:
Knowledge Acquisition Infrastructure ✅
- ✓ Research Paper Ingestion (arXiv + Semantic Scholar)
- ⏳ Video Transcript Processing (separate MCP)
- ⏳ GitHub Repository Analysis (future)
- ⏳ Documentation Scraping (future)
- ⏳ Knowledge Graph Integration (future)
Impact: System can now autonomously learn from the latest AI research!
