🌐 Amazon Q Web Documentation Reader

MCP Server for Intelligent Web Content Extraction

A Model Context Protocol (MCP) server that enables Amazon Q to intelligently navigate and extract documentation from websites.
Amazon Q uses Claude 4.5 to make smart decisions about which pages to visit and what content to extract.

Features • Installation • Setup • Usage • Tools

✨ Features

🧠 Intelligent Navigation - Amazon Q (Claude 4.5) decides which documentation pages to visit
🧹 Clean Content Extraction - Removes navigation, ads, scripts, and other non-content elements
📝 Multiple Output Formats - Supports both Markdown and plain text output
💻 Code Block Extraction - Specifically extracts code examples from documentation
📊 Page Structure Analysis - Extracts heading hierarchy and table of contents
🔗 Link Discovery - Finds and filters documentation links
📚 Batch Processing - Read multiple documentation pages at once

🎯 How It Works

User: "I'm having issues with Razorpay routes"
      Documentation: https://razorpay.com/docs

Amazon Q (Claude 4.5):
  1. Reads main docs page
  2. Sees links: ["Payments", "Routes", "Webhooks", ...]
  3. Intelligently decides: "Routes link is relevant!"
  4. Navigates to Routes documentation
  5. Extracts content and solves your problem

All navigation decisions = Amazon Q's Claude brain 🧠
MCP Server = Clean content extraction tool 🛠️

📦 Installation

Prerequisites

Python 3.12 or higher
uv (recommended) or pip
Amazon Q CLI

Step 1: Clone the Repository

git clone https://github.com/yourusername/amazon-q-web_search.git
cd amazon-q-web_search

Step 2: Install Dependencies

Using uv (Recommended):

uv sync

Using pip:

pip install -e .

🔧 Setup with Amazon Q

Step 1: Locate Your MCP Configuration File

Amazon Q looks for MCP server configuration in:

Linux/WSL: ~/.aws/amazonq/mcp.json
macOS: ~/.aws/amazonq/mcp.json
Windows: %USERPROFILE%\.aws\amazonq\mcp.json

Step 2: Create/Edit the Configuration File

Create the directory if it doesn't exist:

mkdir -p ~/.aws/amazonq

Edit or create ~/.aws/amazonq/mcp.json:

For Linux/WSL:

{
  "mcpServers": {
    "doc_reader": {
      "command": "/full/path/to/amazon-q-web_search/.venv/bin/python",
      "args": ["/full/path/to/amazon-q-web_search/main.py"]
    }
  }
}

For macOS:

{
  "mcpServers": {
    "doc_reader": {
      "command": "/full/path/to/amazon-q-web_search/.venv/bin/python",
      "args": ["/full/path/to/amazon-q-web_search/main.py"]
    }
  }
}

For Windows:

{
  "mcpServers": {
    "doc_reader": {
      "command": "C:\\full\\path\\to\\amazon-q-web_search\\.venv\\Scripts\\python.exe",
      "args": ["C:\\full\\path\\to\\amazon-q-web_search\\main.py"]
    }
  }
}

💡 Tip: Replace /full/path/to/ with the actual path where you cloned the repository.

Step 3: Verify Installation

Start Amazon Q CLI:
```
q chat
```

Check if MCP server is loaded:

/mcp

You should see:

doc_reader
  - read_web_documentation
  - get_documentation_links
  - get_page_structure
  - extract_code_examples
  - read_multiple_docs

If not loaded:
- Check the file path in mcp.json is correct
- Restart Amazon Q CLI
- Check logs: q chat logdump

🚀 Usage

Basic Example

In Amazon Q CLI, simply ask about documentation:

I'm having issues with Razorpay routes. Can you help me understand how they work?
Documentation: https://razorpay.com/docs/

Amazon Q will:

✅ Read the main documentation page
✅ Extract all available links
✅ Intelligently identify the "Routes" link
✅ Navigate to the Routes documentation
✅ Provide you with accurate information

More Examples

Python Documentation:

Can you explain Python asyncio event loops?
Documentation: https://docs.python.org/3/library/asyncio.html

FastAPI Tutorial:

How do I create a basic FastAPI application?
Documentation: https://fastapi.tiangolo.com/

AWS Lambda:

How do I create a Lambda function with Python?
Documentation: https://docs.aws.amazon.com/lambda/

🛠 Available Tools

Amazon Q intelligently chains these tools to navigate documentation:

1. `read_web_documentation`

Fetches and extracts clean documentation content from a web page.

Parameters:

url (required): The URL of the documentation page
output_format (optional): "markdown" (default) or "text"

Returns: Extracted documentation content with title and metadata

2. `get_documentation_links`

Extracts all links from a documentation page with optional filtering.

Parameters:

url (required): The URL of the documentation page
filter_pattern (optional): Pattern to filter links (e.g., "api", "guide")

Returns: List of links found on the page

3. `get_page_structure`

Extracts the heading structure and table of contents from a documentation page.

Parameters:

url (required): The URL of the documentation page

Returns: Hierarchical structure of headings on the page

4. `extract_code_examples`

Extracts all code blocks from a documentation page.

Parameters:

url (required): The URL of the documentation page

Returns: All code blocks found with their detected languages

5. `read_multiple_docs`

Reads multiple documentation pages and combines their content.

Parameters:

urls (required): List of documentation URLs (max 10)

Returns: Combined content from all pages

📁 Project Structure

amazon-q-web_search/
├── main.py              # Entry point
├── pyproject.toml       # Project configuration
├── README.md            # This file
├── run_mcp.sh           # Startup script (Linux/macOS)
└── src/
    ├── __init__.py      # Package initialization
    ├── server.py        # MCP server initialization
    ├── config.py        # Configuration constants
    ├── fetcher.py       # HTTP fetching logic
    ├── extractor.py     # HTML content extraction
    ├── formatters.py    # Output formatting
    └── tools.py         # MCP tool definitions

⚙️ Configuration

Edit src/config.py to customize behavior:

Setting	Default	Description
`HTTP_TIMEOUT`	30.0s	Request timeout in seconds
`MAX_CONTENT_LENGTH`	10MB	Maximum content size in bytes
`USER_AGENT`	Custom	HTTP User-Agent string
`REMOVE_TAGS`	Various	HTML tags to remove during extraction
`CONTENT_SELECTORS`	Various	Selectors for finding main content

🐛 Troubleshooting

MCP Server Not Loading

Check configuration:

cat ~/.aws/amazonq/mcp.json

Verify paths are correct:

Use absolute paths, not relative
Check that Python executable exists
Check that main.py exists

Test server manually:

cd /path/to/amazon-q-web_search
.venv/bin/python main.py

Check Amazon Q logs:

q chat logdump

Server Starts But Tools Don't Work

Verify dependencies are installed:

cd /path/to/amazon-q-web_search
.venv/bin/python -c "import httpx, bs4, markdownify; print('OK')"

Reinstall dependencies:

uv sync --reinstall

Connection Timeout

Increase timeout in settings:

q settings mcp.initTimeout 60000

📚 Dependencies

Package	Purpose
httpx	Async HTTP client for fetching web pages
beautifulsoup4	HTML parsing and navigation
lxml	Fast XML/HTML parser
markdownify	HTML to Markdown conversion
mcp	Model Context Protocol SDK

⚠️ Limitations

Limit	Value
Maximum content size	10MB per page
Maximum URLs per batch	10
Request timeout	30 seconds
Content type	HTML only

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

💬 Support

📫 Open an Issue for bug reports or feature requests
⭐ Star this repo if you find it useful!

_{Built with ❤️ for Amazon Q Developer}

🌐 Amazon Q Web Documentation Reader

MCP Server for Intelligent Web Content Extraction

Features • Installation • Setup • Usage • Tools

✨ Features

🧠 Intelligent Navigation - Amazon Q (Claude 4.5) decides which documentation pages to visit
🧹 Clean Content Extraction - Removes navigation, ads, scripts, and other non-content elements
📝 Multiple Output Formats - Supports both Markdown and plain text output
💻 Code Block Extraction - Specifically extracts code examples from documentation
📊 Page Structure Analysis - Extracts heading hierarchy and table of contents
🔗 Link Discovery - Finds and filters documentation links
📚 Batch Processing - Read multiple documentation pages at once

🎯 How It Works

User: "I'm having issues with Razorpay routes"
      Documentation: https://razorpay.com/docs

Amazon Q (Claude 4.5):
  1. Reads main docs page
  2. Sees links: ["Payments", "Routes", "Webhooks", ...]
  3. Intelligently decides: "Routes link is relevant!"
  4. Navigates to Routes documentation
  5. Extracts content and solves your problem

All navigation decisions = Amazon Q's Claude brain 🧠
MCP Server = Clean content extraction tool 🛠️

📦 Installation

Prerequisites

Python 3.12 or higher
uv (recommended) or pip
Amazon Q CLI

Step 1: Clone the Repository

git clone https://github.com/yourusername/amazon-q-web_search.git
cd amazon-q-web_search

Step 2: Install Dependencies

Using uv (Recommended):

uv sync

Using pip:

pip install -e .

🔧 Setup with Amazon Q

Step 1: Locate Your MCP Configuration File

Amazon Q looks for MCP server configuration in:

Linux/WSL: ~/.aws/amazonq/mcp.json
macOS: ~/.aws/amazonq/mcp.json
Windows: %USERPROFILE%\.aws\amazonq\mcp.json

Step 2: Create/Edit the Configuration File

Create the directory if it doesn't exist:

mkdir -p ~/.aws/amazonq

Edit or create ~/.aws/amazonq/mcp.json:

For Linux/WSL:

{
  "mcpServers": {
    "doc_reader": {
      "command": "/full/path/to/amazon-q-web_search/.venv/bin/python",
      "args": ["/full/path/to/amazon-q-web_search/main.py"]
    }
  }
}

For macOS:

{
  "mcpServers": {
    "doc_reader": {
      "command": "/full/path/to/amazon-q-web_search/.venv/bin/python",
      "args": ["/full/path/to/amazon-q-web_search/main.py"]
    }
  }
}

For Windows:

{
  "mcpServers": {
    "doc_reader": {
      "command": "C:\\full\\path\\to\\amazon-q-web_search\\.venv\\Scripts\\python.exe",
      "args": ["C:\\full\\path\\to\\amazon-q-web_search\\main.py"]
    }
  }
}

💡 Tip: Replace /full/path/to/ with the actual path where you cloned the repository.

Step 3: Verify Installation

Start Amazon Q CLI:
```
q chat
```

Check if MCP server is loaded:

/mcp

You should see:

doc_reader
  - read_web_documentation
  - get_documentation_links
  - get_page_structure
  - extract_code_examples
  - read_multiple_docs

If not loaded:
- Check the file path in mcp.json is correct
- Restart Amazon Q CLI
- Check logs: q chat logdump

🚀 Usage

Basic Example

In Amazon Q CLI, simply ask about documentation:

I'm having issues with Razorpay routes. Can you help me understand how they work?
Documentation: https://razorpay.com/docs/

Amazon Q will:

✅ Read the main documentation page
✅ Extract all available links
✅ Intelligently identify the "Routes" link
✅ Navigate to the Routes documentation
✅ Provide you with accurate information

More Examples

Python Documentation:

Can you explain Python asyncio event loops?
Documentation: https://docs.python.org/3/library/asyncio.html

FastAPI Tutorial:

How do I create a basic FastAPI application?
Documentation: https://fastapi.tiangolo.com/

AWS Lambda:

How do I create a Lambda function with Python?
Documentation: https://docs.aws.amazon.com/lambda/

🛠 Available Tools

Amazon Q intelligently chains these tools to navigate documentation:

1. `read_web_documentation`

Fetches and extracts clean documentation content from a web page.

Parameters:

url (required): The URL of the documentation page
output_format (optional): "markdown" (default) or "text"

Returns: Extracted documentation content with title and metadata

2. `get_documentation_links`

Extracts all links from a documentation page with optional filtering.

Parameters:

url (required): The URL of the documentation page
filter_pattern (optional): Pattern to filter links (e.g., "api", "guide")

Returns: List of links found on the page

3. `get_page_structure`

Extracts the heading structure and table of contents from a documentation page.

Parameters:

url (required): The URL of the documentation page

Returns: Hierarchical structure of headings on the page

4. `extract_code_examples`

Extracts all code blocks from a documentation page.

Parameters:

url (required): The URL of the documentation page

Returns: All code blocks found with their detected languages

5. `read_multiple_docs`

Reads multiple documentation pages and combines their content.

Parameters:

urls (required): List of documentation URLs (max 10)

Returns: Combined content from all pages

📁 Project Structure

amazon-q-web_search/
├── main.py              # Entry point
├── pyproject.toml       # Project configuration
├── README.md            # This file
├── run_mcp.sh           # Startup script (Linux/macOS)
└── src/
    ├── __init__.py      # Package initialization
    ├── server.py        # MCP server initialization
    ├── config.py        # Configuration constants
    ├── fetcher.py       # HTTP fetching logic
    ├── extractor.py     # HTML content extraction
    ├── formatters.py    # Output formatting
    └── tools.py         # MCP tool definitions

⚙️ Configuration

Edit src/config.py to customize behavior:

Setting	Default	Description
`HTTP_TIMEOUT`	30.0s	Request timeout in seconds
`MAX_CONTENT_LENGTH`	10MB	Maximum content size in bytes
`USER_AGENT`	Custom	HTTP User-Agent string
`REMOVE_TAGS`	Various	HTML tags to remove during extraction
`CONTENT_SELECTORS`	Various	Selectors for finding main content

🐛 Troubleshooting

MCP Server Not Loading

Check configuration:

cat ~/.aws/amazonq/mcp.json

Verify paths are correct:

Use absolute paths, not relative
Check that Python executable exists
Check that main.py exists

Test server manually:

cd /path/to/amazon-q-web_search
.venv/bin/python main.py

Check Amazon Q logs:

q chat logdump

Server Starts But Tools Don't Work

Verify dependencies are installed:

cd /path/to/amazon-q-web_search
.venv/bin/python -c "import httpx, bs4, markdownify; print('OK')"

Reinstall dependencies:

uv sync --reinstall

Connection Timeout

Increase timeout in settings:

q settings mcp.initTimeout 60000

📚 Dependencies

Package	Purpose
httpx	Async HTTP client for fetching web pages
beautifulsoup4	HTML parsing and navigation
lxml	Fast XML/HTML parser
markdownify	HTML to Markdown conversion
mcp	Model Context Protocol SDK

⚠️ Limitations

Limit	Value
Maximum content size	10MB per page
Maximum URLs per batch	10
Request timeout	30 seconds
Content type	HTML only

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

💬 Support

📫 Open an Issue for bug reports or feature requests
⭐ Star this repo if you find it useful!

_{Built with ❤️ for Amazon Q Developer}

Amazon Q Web Documentation Reader

🌐 Amazon Q Web Documentation Reader

MCP Server for Intelligent Web Content Extraction

✨ Features

🎯 How It Works

📦 Installation

Prerequisites

Step 1: Clone the Repository

Step 2: Install Dependencies

🔧 Setup with Amazon Q

Step 1: Locate Your MCP Configuration File

Step 2: Create/Edit the Configuration File

Step 3: Verify Installation

🚀 Usage

Basic Example

More Examples

🛠 Available Tools

1. read_web_documentation

2. get_documentation_links

3. get_page_structure

4. extract_code_examples

5. read_multiple_docs

📁 Project Structure

⚙️ Configuration

🐛 Troubleshooting

MCP Server Not Loading

Server Starts But Tools Don't Work

Connection Timeout

📚 Dependencies

⚠️ Limitations

🤝 Contributing

📄 License

💬 Support

🌐 Amazon Q Web Documentation Reader

MCP Server for Intelligent Web Content Extraction

✨ Features

🎯 How It Works

📦 Installation

Prerequisites

Step 1: Clone the Repository

Step 2: Install Dependencies

🔧 Setup with Amazon Q

Step 1: Locate Your MCP Configuration File

Step 2: Create/Edit the Configuration File

Step 3: Verify Installation

🚀 Usage

Basic Example

More Examples

🛠 Available Tools

1. read_web_documentation

2. get_documentation_links

3. get_page_structure

4. extract_code_examples

5. read_multiple_docs

📁 Project Structure

⚙️ Configuration

🐛 Troubleshooting

MCP Server Not Loading

Server Starts But Tools Don't Work

Connection Timeout

📚 Dependencies

⚠️ Limitations

🤝 Contributing

📄 License

💬 Support

1. `read_web_documentation`

2. `get_documentation_links`

3. `get_page_structure`

4. `extract_code_examples`

5. `read_multiple_docs`

1. `read_web_documentation`

2. `get_documentation_links`

3. `get_page_structure`

4. `extract_code_examples`

5. `read_multiple_docs`