Crawl4AI MCP Server
A powerful Model Context Protocol (MCP) server that provides web scraping and crawling capabilities using Crawl4AI. This server acts as the "hands and eyes" for client-side AI, enabling intelligent web content analysis and extraction.
Features
- 🔍 Page Structure Analysis: Extract clean HTML or Markdown content from any webpage
- 🎯 Schema-Based Extraction: Precision data extraction using CSS selectors and AI-generated schemas
- 📸 Screenshot Capture: Visual webpage representation for analysis
- ⚡ Async Operations: Non-blocking web crawling with progress reporting
- 🛡️ Error Handling: Comprehensive error handling and validation
- 📊 MCP Integration: Full Model Context Protocol compatibility with logging and progress tracking
Architecture
┌─────────────────┐ ┌───────────────────┐ ┌─────────────────┐
│ Client AI │ │ Crawl4AI MCP │ │ Web Content │
│ ("Brain") │◄──►│ Server │◄──►│ (Websites) │
│ │ │ ("Hands & Eyes") │ │ │
└─────────────────┘ └───────────────────┘ └─────────────────┘
- FastMCP: Handles MCP protocol and tool registration
- AsyncWebCrawler: Provides async web scraping capabilities
- Stdio Transport: MCP-compatible communication channel
- Error-Safe Logging: All logs directed to stderr to prevent protocol corruption
Installation
Prerequisites
- Python 3.10 or higher
- pip package manager
Setup
-
Clone or download this repository:
git clone <repository-url> cd crawl4ai-mcp -
Create and activate virtual environment:
python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate -
Install dependencies:
pip install -r requirements.txt -
Install Playwright browsers (required for screenshots):
playwright install
Usage
Starting the Server
# Activate virtual environment
source venv/bin/activate
# Start the MCP server
python3 crawl4ai_mcp_server.py
Testing with MCP Inspector
For interactive testing and development:
# Start MCP Inspector interface
fastmcp dev crawl4ai_mcp_server.py
This will start a web interface (usually at http://localhost:6274) where you can test all tools interactively.
Available Tools
1. server_status
Purpose: Get server health and capabilities information
Parameters: None
Example Response:
{
"server_name": "Crawl4AI-MCP-Server",
"version": "1.0.0",
"status": "operational",
"capabilities": ["web_crawling", "content_extraction", "screenshot_capture", "schema_based_extraction"]
}
2. get_page_structure
Purpose: Extract webpage content for analysis (the "eyes" function)
Parameters:
url(string): The webpage URL to analyzeformat(string, optional): Output format - "html" or "markdown" (default: "html")
Example:
{
"url": "https://example.com",
"format": "html"
}
3. crawl_with_schema
Purpose: Precision data extraction using CSS selectors (the "hands" function)
Parameters:
url(string): The webpage URL to extract data fromextraction_schema(string): JSON string defining field names and CSS selectors
Example Schema:
{
"title": "h1",
"description": "p.description",
"price": ".price-value",
"author": ".author-name",
"tags": ".tag"
}
Example Usage:
{
"url": "https://example.com/product",
"extraction_schema": "{\"title\": \"h1\", \"price\": \".price\", \"description\": \"p\"}"
}
4. take_screenshot
Purpose: Capture visual representation of webpage
Parameters:
url(string): The webpage URL to screenshot
Example:
{
"url": "https://example.com"
}
Returns: Base64-encoded PNG image data with metadata
Integration with Claude Desktop
To use this server with Claude Desktop, add this configuration to your Claude Desktop settings:
{
"mcpServers": {
"crawl4ai": {
"command": "python3",
"args": ["/path/to/crawl4ai-mcp/crawl4ai_mcp_server.py"],
"env": {}
}
}
}
Replace /path/to/crawl4ai-mcp/ with the actual path to your installation directory.
Error Handling
All tools include comprehensive error handling and return structured JSON responses:
{
"error": "Error description",
"url": "https://example.com",
"success": false
}
Common error scenarios:
- Invalid URL format
- Network connectivity issues
- Invalid extraction schemas
- Screenshot capture failures
Development
Project Structure
crawl4ai-mcp/
├── crawl4ai_mcp_server.py # Main server implementation
├── requirements.txt # Python dependencies
├── pyproject.toml # Project configuration
├── USAGE_EXAMPLES.md # Detailed usage examples
└── README.md # This file
Dependencies
- fastmcp: FastMCP framework for MCP server development
- crawl4ai: Core web crawling and extraction library
- pydantic: Data validation and parsing
- playwright: Browser automation for screenshots
Testing
Run the linter to ensure code quality:
ruff check .
Test server startup:
python3 crawl4ai_mcp_server.py
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly with MCP Inspector
- Submit a pull request
License
This project is open source. See the LICENSE file for details.
Support
For issues and questions:
- Check the troubleshooting section in USAGE_EXAMPLES.md
- Test with MCP Inspector to isolate issues
- Verify all dependencies are correctly installed
- Ensure virtual environment is activated
Acknowledgments
- Crawl4AI: Powerful web crawling and extraction capabilities
- FastMCP: Streamlined MCP server development framework
- Model Context Protocol: Standardized AI tool integration
