🌐 Amazon Q Web Documentation Reader
MCP Server for Intelligent Web Content Extraction
A Model Context Protocol (MCP) server that enables Amazon Q to intelligently navigate and extract documentation from websites.
Amazon Q uses Claude 4.5 to make smart decisions about which pages to visit and what content to extract.
Features • Installation • Setup • Usage • Tools
✨ Features
- 🧠 Intelligent Navigation - Amazon Q (Claude 4.5) decides which documentation pages to visit
- 🧹 Clean Content Extraction - Removes navigation, ads, scripts, and other non-content elements
- 📝 Multiple Output Formats - Supports both Markdown and plain text output
- 💻 Code Block Extraction - Specifically extracts code examples from documentation
- 📊 Page Structure Analysis - Extracts heading hierarchy and table of contents
- 🔗 Link Discovery - Finds and filters documentation links
- 📚 Batch Processing - Read multiple documentation pages at once
🎯 How It Works
User: "I'm having issues with Razorpay routes"
Documentation: https://razorpay.com/docs
Amazon Q (Claude 4.5):
1. Reads main docs page
2. Sees links: ["Payments", "Routes", "Webhooks", ...]
3. Intelligently decides: "Routes link is relevant!"
4. Navigates to Routes documentation
5. Extracts content and solves your problem
All navigation decisions = Amazon Q's Claude brain 🧠
MCP Server = Clean content extraction tool 🛠️
📦 Installation
Prerequisites
- Python 3.12 or higher
- uv (recommended) or pip
- Amazon Q CLI
Step 1: Clone the Repository
git clone https://github.com/yourusername/amazon-q-web_search.git
cd amazon-q-web_search
Step 2: Install Dependencies
Using uv (Recommended):
uv sync
Using pip:
pip install -e .
🔧 Setup with Amazon Q
Step 1: Locate Your MCP Configuration File
Amazon Q looks for MCP server configuration in:
- Linux/WSL:
~/.aws/amazonq/mcp.json - macOS:
~/.aws/amazonq/mcp.json - Windows:
%USERPROFILE%\.aws\amazonq\mcp.json
Step 2: Create/Edit the Configuration File
Create the directory if it doesn't exist:
mkdir -p ~/.aws/amazonq
Edit or create ~/.aws/amazonq/mcp.json:
For Linux/WSL:
{
"mcpServers": {
"doc_reader": {
"command": "/full/path/to/amazon-q-web_search/.venv/bin/python",
"args": ["/full/path/to/amazon-q-web_search/main.py"]
}
}
}
For macOS:
{
"mcpServers": {
"doc_reader": {
"command": "/full/path/to/amazon-q-web_search/.venv/bin/python",
"args": ["/full/path/to/amazon-q-web_search/main.py"]
}
}
}
For Windows:
{
"mcpServers": {
"doc_reader": {
"command": "C:\\full\\path\\to\\amazon-q-web_search\\.venv\\Scripts\\python.exe",
"args": ["C:\\full\\path\\to\\amazon-q-web_search\\main.py"]
}
}
}
💡 Tip: Replace /full/path/to/ with the actual path where you cloned the repository.
Step 3: Verify Installation
-
Start Amazon Q CLI:
q chat -
Check if MCP server is loaded:
/mcpYou should see:
doc_reader - read_web_documentation - get_documentation_links - get_page_structure - extract_code_examples - read_multiple_docs -
If not loaded:
- Check the file path in
mcp.jsonis correct - Restart Amazon Q CLI
- Check logs:
q chat logdump
- Check the file path in
🚀 Usage
Basic Example
In Amazon Q CLI, simply ask about documentation:
I'm having issues with Razorpay routes. Can you help me understand how they work?
Documentation: https://razorpay.com/docs/
Amazon Q will:
- ✅ Read the main documentation page
- ✅ Extract all available links
- ✅ Intelligently identify the "Routes" link
- ✅ Navigate to the Routes documentation
- ✅ Provide you with accurate information
More Examples
Python Documentation:
Can you explain Python asyncio event loops?
Documentation: https://docs.python.org/3/library/asyncio.html
FastAPI Tutorial:
How do I create a basic FastAPI application?
Documentation: https://fastapi.tiangolo.com/
AWS Lambda:
How do I create a Lambda function with Python?
Documentation: https://docs.aws.amazon.com/lambda/
🛠 Available Tools
Amazon Q intelligently chains these tools to navigate documentation:
1. read_web_documentation
Fetches and extracts clean documentation content from a web page.
Parameters:
url(required): The URL of the documentation pageoutput_format(optional):"markdown"(default) or"text"
Returns: Extracted documentation content with title and metadata
2. get_documentation_links
Extracts all links from a documentation page with optional filtering.
Parameters:
url(required): The URL of the documentation pagefilter_pattern(optional): Pattern to filter links (e.g.,"api","guide")
Returns: List of links found on the page
3. get_page_structure
Extracts the heading structure and table of contents from a documentation page.
Parameters:
url(required): The URL of the documentation page
Returns: Hierarchical structure of headings on the page
4. extract_code_examples
Extracts all code blocks from a documentation page.
Parameters:
url(required): The URL of the documentation page
Returns: All code blocks found with their detected languages
5. read_multiple_docs
Reads multiple documentation pages and combines their content.
Parameters:
urls(required): List of documentation URLs (max 10)
Returns: Combined content from all pages
📁 Project Structure
amazon-q-web_search/
├── main.py # Entry point
├── pyproject.toml # Project configuration
├── README.md # This file
├── run_mcp.sh # Startup script (Linux/macOS)
└── src/
├── __init__.py # Package initialization
├── server.py # MCP server initialization
├── config.py # Configuration constants
├── fetcher.py # HTTP fetching logic
├── extractor.py # HTML content extraction
├── formatters.py # Output formatting
└── tools.py # MCP tool definitions
⚙️ Configuration
Edit src/config.py to customize behavior:
| Setting | Default | Description |
|---|---|---|
HTTP_TIMEOUT | 30.0s | Request timeout in seconds |
MAX_CONTENT_LENGTH | 10MB | Maximum content size in bytes |
USER_AGENT | Custom | HTTP User-Agent string |
REMOVE_TAGS | Various | HTML tags to remove during extraction |
CONTENT_SELECTORS | Various | Selectors for finding main content |
🐛 Troubleshooting
MCP Server Not Loading
Check configuration:
cat ~/.aws/amazonq/mcp.json
Verify paths are correct:
- Use absolute paths, not relative
- Check that Python executable exists
- Check that main.py exists
Test server manually:
cd /path/to/amazon-q-web_search
.venv/bin/python main.py
Check Amazon Q logs:
q chat logdump
Server Starts But Tools Don't Work
Verify dependencies are installed:
cd /path/to/amazon-q-web_search
.venv/bin/python -c "import httpx, bs4, markdownify; print('OK')"
Reinstall dependencies:
uv sync --reinstall
Connection Timeout
Increase timeout in settings:
q settings mcp.initTimeout 60000
📚 Dependencies
| Package | Purpose |
|---|---|
| httpx | Async HTTP client for fetching web pages |
| beautifulsoup4 | HTML parsing and navigation |
| lxml | Fast XML/HTML parser |
| markdownify | HTML to Markdown conversion |
| mcp | Model Context Protocol SDK |
⚠️ Limitations
| Limit | Value |
|---|---|
| Maximum content size | 10MB per page |
| Maximum URLs per batch | 10 |
| Request timeout | 30 seconds |
| Content type | HTML only |
🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
💬 Support
- 📫 Open an Issue for bug reports or feature requests
- ⭐ Star this repo if you find it useful!
