Paper Download MCP Server
MCP server for downloading academic papers from multiple sources with intelligent routing.
Note: This project is built on top of scihub-cli, adapting its core functionality for MCP integration. If you find this useful, consider starring both projects!
Features
- Multi-Source Support: Downloads from multiple sources with automatic fallback
- arXiv: Prioritized for preprints (free, no API key needed)
- Unpaywall: For open access papers (requires email)
- Direct PDF: Handles direct PDF URLs
- PMC: PubMed Central articles
- HTML Landing: Extracts PDF links from article pages
- Sci-Hub: Fallback for older papers (coverage-driven)
- CORE: Additional OA fallback
- Intelligent Routing: Priority-based source selection with year-aware routing
- 3 MCP Tools:
paper_download- Download single paper by DOI or URLpaper_batch_download- Download multiple papers with progress reportingpaper_metadata- Get paper metadata without downloading PDF
- Clean Filenames:
[YYYY] - Paper Title.pdfformat - Rate Limiting: Built-in delays for API compliance
- Comprehensive Error Messages: Actionable suggestions on failures
Installation
For Users (Recommended)
No manual installation required! Use uvx for automatic environment management:
{
"mcpServers": {
"paper-download": {
"command": "uvx",
"args": ["paper-download-mcp"],
"env": {
"PAPER_DOWNLOAD_EMAIL": "your-email@university.edu"
}
}
}
}
For Developers
git clone <repository-url>
cd paper-download-mcp
uv sync
uv run python -m paper_download_mcp.server
Configuration
Required Environment Variables
PAPER_DOWNLOAD_EMAIL: Your email address (required for Unpaywall API compliance)- Example:
researcher@university.edu - Used for Unpaywall API tracking and contact purposes
- Example:
Optional Environment Variables
PAPER_DOWNLOAD_OUTPUT_DIR: Default output directory (default:./downloads)
Claude Desktop Setup
Add to your claude_desktop_config.json:
{
"mcpServers": {
"paper-download": {
"command": "uvx",
"args": ["paper-download-mcp"],
"env": {
"PAPER_DOWNLOAD_EMAIL": "your-email@university.edu",
"PAPER_DOWNLOAD_OUTPUT_DIR": "/path/to/papers"
}
}
}
}
After configuration, restart Claude Desktop.
Tools
paper_download
Download a single academic paper by DOI or URL.
Parameters:
identifier(required): DOI or URL (e.g.,10.1038/nature12373)output_dir(optional): Output directory (default:./downloads)
Example:
Download the paper 10.1038/nature12373
Returns:
- Markdown with download details (file path, size, source, timing)
- Error message with suggestions if download fails
paper_batch_download
Download multiple papers sequentially with progress reporting.
Parameters:
identifiers(required): List of DOIs or URLs (1-50 maximum)output_dir(optional): Output directory (default:./downloads)
Example:
Download these papers: 10.1038/nature12373, 10.1126/science.1234567
Returns:
- Markdown summary with statistics
- List of successful downloads
- List of failed downloads with errors
Note: Downloads are sequential with 2-second delays for rate limiting.
paper_metadata
Retrieve paper metadata without downloading the PDF.
Parameters:
identifier(required): DOI or URL
Example:
Get metadata for 10.1038/nature12373
Returns:
- JSON with paper details:
- DOI, title, year, authors, journal
- Open access status
- Available download sources
How It Works
Intelligent Source Routing
The server uses priority routing to keep fast sources first:
- arXiv IDs/URLs: Try arXiv first, then OA sources if needed
- DOIs (year < 2021): OA sources first, Sci-Hub last
- DOIs (year ≥ 2021): OA sources only (Sci-Hub has no coverage)
- Year unknown: OA sources first, Sci-Hub last
Download Process
- Normalize DOI from URL/identifier
- Detect publication year via Crossref API (DOIs only)
- Route to appropriate source based on priority/year
- Download PDF with retry on failure
- Validate file (PDF header, size check)
- Generate filename:
[YYYY] - Title.pdf - Return absolute file path
Rate Limiting
- 2-second delay between batch downloads
- Respects Unpaywall API limits (~100k requests/day)
- Built-in exponential backoff retry (3 attempts max)
Troubleshooting
"PAPER_DOWNLOAD_EMAIL environment variable is required"
Solution: Set the email in your Claude Desktop config (see Configuration section above).
"Paper not found in any source"
Possible causes:
- Invalid or incorrect DOI
- Paper too recent (not yet indexed)
- Paper behind paywall with no open access version
- Sci-Hub mirrors temporarily unavailable
Solutions:
- Verify DOI on doi.org
- Use
paper_metadatato check availability - Try again later (mirrors may recover)
Download times out
Causes:
- Slow network connection
- Sci-Hub mirror selection taking too long
- Large PDF file
Solutions:
- Check internet connection
- Retry (mirror selection is cached after first success)
- Single papers typically complete in <15 seconds
Downloaded file is corrupted
The server validates PDFs before returning. If you encounter corruption:
- Check disk space
- Verify file permissions in output directory
- Try different paper (may be source issue)
Testing
MCP Inspector
Test the server with MCP Inspector:
export PAPER_DOWNLOAD_EMAIL=test@example.com
npx @modelcontextprotocol/inspector uv run python -m paper_download_mcp.server
Unit Tests
uv run pytest
Legal Notice
IMPORTANT: This tool provides access to academic papers through multiple sources:
-
Unpaywall (https://unpaywall.org): Legal open-access aggregator operated by OurResearch. Recommended and prioritized when available.
-
Sci-Hub: Operates in a legal gray area. While it provides access to research, it may violate copyright laws in some jurisdictions. Use at your own risk.
User Responsibilities:
- You are responsible for compliance with applicable copyright laws in your jurisdiction
- This tool is intended for research and educational purposes only
- The maintainers assume no liability for how you use this tool
- When possible, prefer legal open-access sources (Unpaywall)
By using this tool, you acknowledge these legal considerations and agree to use it responsibly.
Project Structure
paper-download-mcp/
├── src/
│ └── paper_download_mcp/
│ ├── server.py # FastMCP entry point
│ ├── models.py # Pydantic input schemas
│ ├── formatters.py # Markdown/JSON formatters
│ ├── tools/
│ │ ├── download.py # Download tools
│ │ └── metadata.py # Metadata tool
│ └── scihub_core/ # Copied from scihub-cli
├── pyproject.toml
├── README.md
└── .gitignore
Architecture
Layer Architecture
- FastMCP Server Layer: Protocol handling, tool registration, config validation
- MCP Tools Layer: Request parsing, response formatting, async coordination
- Models & Formatters: Data validation, output serialization
- scihub_core Layer: Academic paper logic (unchanged from scihub-cli)
Async Pattern
All tools use asyncio.to_thread() to wrap synchronous scihub-cli code:
@mcp.tool()
async def paper_download(...):
def _sync_download():
# Synchronous scihub-cli code
client = SciHubClient()
return client.download_paper(doi)
# Run in thread pool
result = await asyncio.to_thread(_sync_download)
return format_result(result)
This preserves the battle-tested scihub-cli code without modifications.
Performance
| Operation | Target | Typical | Max |
|---|---|---|---|
| Get Metadata | <1s | 0.5s | 2s |
| Single Download | <5s | 2-3s | 10s |
| Batch (10 papers) | <40s | 25-30s | 60s |
Note: First download may take longer (5-10s) due to mirror selection. Subsequent downloads use cached mirror.
Contributing
For Maintainers: Syncing from scihub-cli
The scihub_core/ directory contains code copied from the upstream scihub-cli project. When bugs are fixed or features added to scihub-cli:
Workflow:
- Fix/implement in
scihub-cliproject first - Run tests and commit to scihub-cli
- Copy updated files to
paper-download-mcp/src/paper_download_mcp/scihub_core/ - Test MCP server functionality
- Commit with message referencing upstream commit:
sync: Update <file> from scihub-cli (<description>) Synced from scihub-cli commit <hash> <details of changes>
Last sync: scihub-cli@9787efc (2024-12-02) - Fixed year type bug in UnpaywallSource
License
MIT License - See LICENSE file for details
Credits
- Built with FastMCP
- Uses scihub-cli as core download engine
- Metadata from Unpaywall and Crossref
Support
For issues or questions:
- Check the Troubleshooting section above
- Open an issue on GitHub (include error messages and steps to reproduce)
Disclaimer: This tool is provided as-is for research and educational purposes. Users assume all responsibility for compliance with applicable laws and regulations.
