Digest MCP Server
MCP server for web content digestion using browserless.io via puppeteer-core. Extracts fully rendered DOM content from dynamic web pages including SPAs and infinite scroll sites.
Features
- Connect to browserless.io cloud browsers
- Load web pages with configurable wait times
- Scroll down pages multiple times with delays
- Extract complete page content (HTML)
Installation
npm install
npm run build
Configuration
Set your browserless.io API key using one of these methods:
Option 1: Using .env file (recommended)
Create a .env file in the project root:
cp .env.example .env
Then edit .env and add your API key:
BROWSERLESS_API_KEY=your_api_key_here
Option 2: Using environment variable
export BROWSERLESS_API_KEY=your_api_key_here
Usage
Running the Server
The server uses stdio transport for MCP communication:
node build/index.js
Tool: web_content
Fetches web page content with optional scrolling and HTML cleanup.
Parameters:
url(string, required): The URL to fetchinitialWaitTime(number, optional): Time to wait in milliseconds after loading the page. Default: 3000scrolls(number, optional): Number of times to scroll down the page. Default: 5scrollWaitTime(number, optional): Time to wait in milliseconds between each scroll. Default: 1000cleanup(boolean, optional): Whether to clean up HTML (remove scripts, styles, SVG, forms, etc.) and keep only meaningful text content. Default: false
Returns:
size(number): Size of the content in bytescontent(string): The fetched HTML content
Example:
{
"url": "https://example.com",
"initialWaitTime": 2000,
"scrolls": 3,
"scrollWaitTime": 1000,
"cleanup": true
}
How It Works
- Connects to browserless.io using your API key via WebSocket
- Creates a new page in the remote browser
- Navigates to the specified URL (waits for DOM content loaded)
- Waits 1 second for page stabilization
- Waits for the initial wait time (default: 3 seconds)
- Scrolls to the bottom of the page the specified number of times
- After each scroll, intelligently waits for new content to load by:
- Monitoring page height changes
- Detecting dynamically loaded content
- Waiting up to scrollWaitTime for new content (default: 3 seconds)
- Waits for network to idle (AJAX requests complete)
- Waits 1 additional second for JavaScript rendering
- Returns the fully RENDERED DOM (not raw HTML source)
- Includes all JavaScript-generated content
- Includes all AJAX-loaded content
- Includes all dynamically inserted elements
- Uses
document.documentElement.outerHTMLfor complete rendered state
Dynamic Content & Infinite Scroll
The tool is specifically designed for modern web applications with dynamic content:
AJAX/JavaScript Handling:
- ✅ Waits for network idle: Ensures all AJAX requests complete
- ✅ Returns rendered DOM: Gets actual content after JavaScript execution
- ✅ Not raw HTML source: Uses browser's rendered output
- ✅ Includes dynamic elements: Captures content inserted by React, Vue, Angular, etc.
Infinite Scroll Support:
- ✅ Scrolls to bottom: Triggers lazy-loading mechanisms
- ✅ Detects new content: Monitors page height changes
- ✅ Smart waiting: Exits early when content loads
- ✅ Multiple fallbacks: Keyboard scroll if JavaScript fails
Perfect for:
- Single Page Applications (React, Vue, Angular)
- Infinite scroll feeds (Twitter, Facebook, LinkedIn)
- Lazy-loaded images and content
- AJAX-powered content (search results, filters)
- Dynamic dashboards and admin panels
Tips for best results:
- Default
scrolls: 5works well for most pages with lazy-loaded content - Increase
scrollsto 10-15 for very long infinite scroll pages - Set
scrolls: 0to disable scrolling for static pages - Use
scrollWaitTimeof 1000-3000ms for slow-loading content (default: 1000ms) - Increase
initialWaitTimeto 5000+ if page has heavy initialization - For SPAs, allow time for initial JavaScript bootstrap
- Use
cleanup: trueto extract only meaningful text content without scripts, styles, and visual elements - Use
cleanup: false(default) to get the full rendered HTML
MCP Client Configuration
Add to your MCP client configuration (e.g., Claude Desktop):
{
"mcpServers": {
"digest": {
"command": "node",
"args": ["/path/to/digest-mcp/build/index.js"],
"env": {
"BROWSERLESS_API_KEY": "your_api_key_here"
}
}
}
}
License
ISC
