Crawl4AI MCP Server

A powerful Model Context Protocol (MCP) server that provides web scraping and crawling capabilities using Crawl4AI. This server acts as the "hands and eyes" for client-side AI, enabling intelligent web content analysis and extraction.

Features

🔍 Page Structure Analysis: Extract clean HTML or Markdown content from any webpage
🎯 Schema-Based Extraction: Precision data extraction using CSS selectors and AI-generated schemas
📸 Screenshot Capture: Visual webpage representation for analysis
⚡ Async Operations: Non-blocking web crawling with progress reporting
🛡️ Error Handling: Comprehensive error handling and validation
📊 MCP Integration: Full Model Context Protocol compatibility with logging and progress tracking

Architecture

┌─────────────────┐    ┌───────────────────┐    ┌─────────────────┐
│   Client AI     │    │  Crawl4AI MCP     │    │   Web Content   │
│   ("Brain")     │◄──►│   Server          │◄──►│   (Websites)    │
│                 │    │  ("Hands & Eyes") │    │                 │
└─────────────────┘    └───────────────────┘    └─────────────────┘

FastMCP: Handles MCP protocol and tool registration
AsyncWebCrawler: Provides async web scraping capabilities
Stdio Transport: MCP-compatible communication channel
Error-Safe Logging: All logs directed to stderr to prevent protocol corruption

Installation

Prerequisites

Python 3.10 or higher
pip package manager

Setup

Clone or download this repository:

git clone <repository-url>
cd crawl4ai-mcp

Create and activate virtual environment:

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```
Install Playwright browsers (required for screenshots):
```
playwright install
```

Usage

Starting the Server

# Activate virtual environment
source venv/bin/activate

# Start the MCP server
python3 crawl4ai_mcp_server.py

Testing with MCP Inspector

For interactive testing and development:

# Start MCP Inspector interface
fastmcp dev crawl4ai_mcp_server.py

This will start a web interface (usually at http://localhost:6274) where you can test all tools interactively.

Available Tools

1. server_status

Purpose: Get server health and capabilities information
Parameters: None

Example Response:

{
  "server_name": "Crawl4AI-MCP-Server",
  "version": "1.0.0", 
  "status": "operational",
  "capabilities": ["web_crawling", "content_extraction", "screenshot_capture", "schema_based_extraction"]
}

2. get_page_structure

Purpose: Extract webpage content for analysis (the "eyes" function)
Parameters:

url (string): The webpage URL to analyze
format (string, optional): Output format - "html" or "markdown" (default: "html")

Example:

{
  "url": "https://example.com",
  "format": "html"
}

3. crawl_with_schema

Purpose: Precision data extraction using CSS selectors (the "hands" function)
Parameters:

url (string): The webpage URL to extract data from
extraction_schema (string): JSON string defining field names and CSS selectors

Example Schema:

{
  "title": "h1",
  "description": "p.description", 
  "price": ".price-value",
  "author": ".author-name",
  "tags": ".tag"
}

Example Usage:

{
  "url": "https://example.com/product",
  "extraction_schema": "{\"title\": \"h1\", \"price\": \".price\", \"description\": \"p\"}"
}

4. take_screenshot

Purpose: Capture visual representation of webpage
Parameters:

url (string): The webpage URL to screenshot

Example:

{
  "url": "https://example.com"
}

Returns: Base64-encoded PNG image data with metadata

Integration with Claude Desktop

To use this server with Claude Desktop, add this configuration to your Claude Desktop settings:

{
  "mcpServers": {
    "crawl4ai": {
      "command": "python3",
      "args": ["/path/to/crawl4ai-mcp/crawl4ai_mcp_server.py"],
      "env": {}
    }
  }
}

Replace /path/to/crawl4ai-mcp/ with the actual path to your installation directory.

Error Handling

All tools include comprehensive error handling and return structured JSON responses:

{
  "error": "Error description",
  "url": "https://example.com", 
  "success": false
}

Common error scenarios:

Invalid URL format
Network connectivity issues
Invalid extraction schemas
Screenshot capture failures

Development

Project Structure

crawl4ai-mcp/
├── crawl4ai_mcp_server.py    # Main server implementation
├── requirements.txt          # Python dependencies
├── pyproject.toml           # Project configuration
├── USAGE_EXAMPLES.md        # Detailed usage examples
└── README.md               # This file

Dependencies

fastmcp: FastMCP framework for MCP server development
crawl4ai: Core web crawling and extraction library
pydantic: Data validation and parsing
playwright: Browser automation for screenshots

Testing

Run the linter to ensure code quality:

ruff check .

Test server startup:

python3 crawl4ai_mcp_server.py

Contributing

Fork the repository
Create a feature branch
Make your changes
Test thoroughly with MCP Inspector
Submit a pull request

License

This project is open source. See the LICENSE file for details.

Support

For issues and questions:

Check the troubleshooting section in USAGE_EXAMPLES.md
Test with MCP Inspector to isolate issues
Verify all dependencies are correctly installed
Ensure virtual environment is activated

Acknowledgments

Crawl4AI: Powerful web crawling and extraction capabilities
FastMCP: Streamlined MCP server development framework
Model Context Protocol: Standardized AI tool integration

Crawl4AI MCP Server

Features

🔍 Page Structure Analysis: Extract clean HTML or Markdown content from any webpage
🎯 Schema-Based Extraction: Precision data extraction using CSS selectors and AI-generated schemas
📸 Screenshot Capture: Visual webpage representation for analysis
⚡ Async Operations: Non-blocking web crawling with progress reporting
🛡️ Error Handling: Comprehensive error handling and validation
📊 MCP Integration: Full Model Context Protocol compatibility with logging and progress tracking

Architecture

┌─────────────────┐    ┌───────────────────┐    ┌─────────────────┐
│   Client AI     │    │  Crawl4AI MCP     │    │   Web Content   │
│   ("Brain")     │◄──►│   Server          │◄──►│   (Websites)    │
│                 │    │  ("Hands & Eyes") │    │                 │
└─────────────────┘    └───────────────────┘    └─────────────────┘

FastMCP: Handles MCP protocol and tool registration
AsyncWebCrawler: Provides async web scraping capabilities
Stdio Transport: MCP-compatible communication channel
Error-Safe Logging: All logs directed to stderr to prevent protocol corruption

Installation

Prerequisites

Python 3.10 or higher
pip package manager

Setup

Clone or download this repository:

git clone <repository-url>
cd crawl4ai-mcp

Create and activate virtual environment:

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```
Install Playwright browsers (required for screenshots):
```
playwright install
```

Usage

Starting the Server

# Activate virtual environment
source venv/bin/activate

# Start the MCP server
python3 crawl4ai_mcp_server.py

Testing with MCP Inspector

For interactive testing and development:

# Start MCP Inspector interface
fastmcp dev crawl4ai_mcp_server.py

This will start a web interface (usually at http://localhost:6274) where you can test all tools interactively.

Available Tools

1. server_status

Purpose: Get server health and capabilities information
Parameters: None

Example Response:

{
  "server_name": "Crawl4AI-MCP-Server",
  "version": "1.0.0", 
  "status": "operational",
  "capabilities": ["web_crawling", "content_extraction", "screenshot_capture", "schema_based_extraction"]
}

2. get_page_structure

Purpose: Extract webpage content for analysis (the "eyes" function)
Parameters:

url (string): The webpage URL to analyze
format (string, optional): Output format - "html" or "markdown" (default: "html")

Example:

{
  "url": "https://example.com",
  "format": "html"
}

3. crawl_with_schema

Purpose: Precision data extraction using CSS selectors (the "hands" function)
Parameters:

url (string): The webpage URL to extract data from
extraction_schema (string): JSON string defining field names and CSS selectors

Example Schema:

{
  "title": "h1",
  "description": "p.description", 
  "price": ".price-value",
  "author": ".author-name",
  "tags": ".tag"
}

Example Usage:

{
  "url": "https://example.com/product",
  "extraction_schema": "{\"title\": \"h1\", \"price\": \".price\", \"description\": \"p\"}"
}

4. take_screenshot

Purpose: Capture visual representation of webpage
Parameters:

url (string): The webpage URL to screenshot

Example:

{
  "url": "https://example.com"
}

Returns: Base64-encoded PNG image data with metadata

Integration with Claude Desktop

To use this server with Claude Desktop, add this configuration to your Claude Desktop settings:

{
  "mcpServers": {
    "crawl4ai": {
      "command": "python3",
      "args": ["/path/to/crawl4ai-mcp/crawl4ai_mcp_server.py"],
      "env": {}
    }
  }
}

Replace /path/to/crawl4ai-mcp/ with the actual path to your installation directory.

Error Handling

All tools include comprehensive error handling and return structured JSON responses:

{
  "error": "Error description",
  "url": "https://example.com", 
  "success": false
}

Common error scenarios:

Invalid URL format
Network connectivity issues
Invalid extraction schemas
Screenshot capture failures

Development

Project Structure

crawl4ai-mcp/
├── crawl4ai_mcp_server.py    # Main server implementation
├── requirements.txt          # Python dependencies
├── pyproject.toml           # Project configuration
├── USAGE_EXAMPLES.md        # Detailed usage examples
└── README.md               # This file

Dependencies

fastmcp: FastMCP framework for MCP server development
crawl4ai: Core web crawling and extraction library
pydantic: Data validation and parsing
playwright: Browser automation for screenshots

Testing

Run the linter to ensure code quality:

ruff check .

Test server startup:

python3 crawl4ai_mcp_server.py

Contributing

Fork the repository
Create a feature branch
Make your changes
Test thoroughly with MCP Inspector
Submit a pull request

License

This project is open source. See the LICENSE file for details.

Support

For issues and questions:

Check the troubleshooting section in USAGE_EXAMPLES.md
Test with MCP Inspector to isolate issues
Verify all dependencies are correctly installed
Ensure virtual environment is activated

Acknowledgments

Crawl4AI: Powerful web crawling and extraction capabilities
FastMCP: Streamlined MCP server development framework
Model Context Protocol: Standardized AI tool integration