Enhanced Web Scraper MCP Server
A professional Model Context Protocol (MCP) server for web scraping, React app testing, and React Native web app inspection using Playwright. Fully backward compatible with regular websites and standard React applications.
🚀 Latest Improvements
- 🔥 Context-Optimized Screenshots - Screenshots return only file paths and analysis text (no base64 data)
- 📊 Enhanced Page Analysis - Detailed element counting, content structure analysis, and page state inspection
- 🔍 Comprehensive Comparison Tools - Visual similarity analysis with layout, color, and typography detection
- 💾 File-Based Output - All screenshots saved to
/tmp/with structured analysis data - 🎯 Smart Content Detection - Automatically detects empty states, loading indicators, and content availability
- Enhanced Error Handling - Comprehensive input validation and error reporting
- Optimized Performance - Reduced code duplication and improved efficiency
- Standardized Timeouts - Configurable timeout constants for reliability
- Professional Code Structure - ES6+ best practices and maintainable architecture
🔄 Backward Compatibility
This enhanced server maintains 100% compatibility with:
- ✅ Regular websites (HTML, CSS, JavaScript)
- ✅ Standard React applications (Create React App, Next.js, etc.)
- ✅ Traditional web scraping workflows
- ✅ Existing CSS selectors and interactions
Plus new enhanced support for:
- 🆕 React Native web applications
- 🆕 Expo web projects
- 🆕 Mobile viewport emulation
- 🆕 Advanced React component inspection
📋 Tools Overview
| Tool | Purpose | Best For |
|---|---|---|
take_screenshot | Context-free screenshot capture | Visual analysis, UI documentation |
compare_screenshots | Visual UI comparison with semantic analysis | UI replication, visual regression testing |
scrape_page | Universal web scraping | Content extraction, data collection |
test_react_app | React app testing with mobile gestures | UI testing, interaction automation |
get_page_info | Page analysis with React insights | Performance monitoring, framework detection |
extract_content | Clean content extraction | Documentation, article processing |
wait_for_element | Smart element waiting | Dynamic content, loading states |
inspect_react_app | React component analysis | Component debugging, state inspection |
wait_for_react_state | React state management | Hydration, navigation, data loading |
execute_in_react_context | JavaScript execution in React context | Advanced debugging, custom scripts |
check_expo_dev_server | Expo development server status | Development workflow, debugging |
duckduckgo_search | DuckDuckGo search with result extraction | Research, finding relevant URLs for content extraction |
Key Features for AI Visual Analysis
🔥 Context-Free Design
- No Base64 Data: Screenshots return only file paths and analysis text
- Minimal Context Usage: Dramatically reduced token consumption per screenshot
- File-Based Storage: All images saved to
/tmp/for external access - Structured Analysis: Rich text analysis without heavy image data
🔍 Smart Content Detection
- Empty State Detection: Automatically identifies when pages have no meaningful content
- Table Population Verification: Counts table rows to verify data is actually displaying
- Loading State Recognition: Detects and waits for loading indicators to disappear
- Content Structure Analysis: Provides detailed breakdown of page elements
📁 File-Based Output
Every visual tool provides:
- 📊 Analysis Text: Element counts, text content, structural analysis
- 📁 File Path: Saved screenshot location for external viewing
- 🎯 Pass/Fail Status: Built-in success criteria for automated workflows
🎯 Migration & Testing Support
Perfect for:
- UI Migration Verification: Compare source vs target implementations
- Mock Data Validation: Verify that mock data is actually displaying
- Visual Regression Testing: Ensure UI changes don't break layouts
- Component Testing: Validate React components render correctly
📊 Success Metrics Integration
- Configurable Similarity Thresholds: Built-in pass/fail criteria for visual comparisons
- Populated Data Requirements: Detects empty states that prevent meaningful comparison
- Comprehensive Reporting: Detailed analysis for debugging visual differences
Available Tools
1. take_screenshot - Context-Free Screenshot Capture
Captures screenshots with comprehensive analysis while keeping context usage minimal.
{
url: "https://example.com",
browser: "chromium",
device: "iPhone 12", // Optional device emulation
fullPage: true,
waitForSPA: true // Auto-detects and waits for React/Vue/Angular apps
}
Returns:
- 📊 Comprehensive Analysis: Element counts, page structure, content preview
- 📁 File Path: Screenshot saved to
/tmp/screenshot-[timestamp].png - 🎯 Content Status: Pass/fail indicators for populated data
Example Output:
📸 Screenshot saved to: /tmp/screenshot-1234567890.png
📄 Page Analysis:
- Title: "My React App"
- Has Content: ✅
- Visible Elements: 247
📊 Content Elements:
- Headings: 3
- Paragraphs: 12
- Buttons: 8
- Tables: 1
- Table Rows: 15 ← Indicates populated data!
📝 Page Content Preview:
Welcome to our service platform. Here you can find contractors...
2. compare_screenshots - Context-Free Visual Comparison
Compares two pages with comprehensive analysis while maintaining minimal context usage.
{
urlA: "https://source-design.com", // Source/reference
urlB: "https://your-implementation.com", // Target/implementation
browser: "chromium",
threshold: 0.1, // Similarity threshold (0-1)
analyzeLayout: true, // Detect alignment differences
analyzeColors: true, // Exact color comparison
analyzeTypography: true, // Font size/weight analysis
waitForSPA: true // Smart SPA detection
}
Returns:
- 📊 Visual Similarity Score: Percentage match with pass/fail status
- 🏗️ Structural Comparison: Element counts, table rows, content structure
- 🎨 Layout Analysis: Alignment differences, positioning issues
- 📁 File Paths: Both screenshots saved to
/tmp/for external viewing
Example Output:
📸 Screenshots saved:
- Source: /tmp/compare-source-1234567890.png
- Target: /tmp/compare-target-1234567891.png
📊 VISUAL SIMILARITY: 87.3% ✅ PASS
🏗️ Structural Comparison:
- Tables: 1 → 1
- Table Rows: 0 → 8 ← Target has populated data!
- Buttons: 12 → 12
📋 Layout Analysis:
- 2 regions with significant layout differences
- Content appears centered in source but left-aligned in target
🎨 Color Analysis:
- Minor color differences detected
- Example: rgb(229, 122, 68) → rgb(225, 118, 64)
3. scrape_page - Universal Web Scraping
Works with any website - regular HTML, React apps, or React Native web.
Regular website example:
{
url: "https://example.com",
selector: ".article-title", // Standard CSS selector
screenshot: true
}
React Native web example:
{
url: "http://localhost:8081",
selector: "login-button", // Will try testID, aria-label fallbacks
mobileViewport: true,
device: "iPhone 12"
}
4. test_react_app - Universal React Testing
Works with any React application - standard React or React Native web.
Standard React app example:
{
url: "http://localhost:3000",
waitForHydration: false, // Optional for regular React apps
actions: [
{ type: "click", selector: "#submit-button" },
{ type: "fill", selector: "input[name='email']", value: "test@example.com" }
]
}
React Native web example:
{
url: "http://localhost:8081",
device: "iPhone 12",
waitForHydration: true, // Recommended for RN web
actions: [
{ type: "tap", selector: "login-button" },
{ type: "swipe", selector: "scroll-view", value: "up" }
]
}
5. get_page_info - Enhanced Page Analysis
Provides comprehensive information for any web page with React-specific insights.
{
url: "https://any-website.com", // Works with any URL
includePerformance: true
}
6. extract_content - Clean Content Extraction
Extract clean, readable content from web pages without HTML/CSS clutter. Perfect for documentation, articles, and structured content consumption.
{
url: "https://docs.example.com/api-guide",
includeLinks: true, // Extract and categorize hyperlinks
format: "markdown" // Output format: 'markdown' or 'text'
}
Output Example:
# API Documentation
## Authentication
You need to obtain an API key [1] from the developer portal [2].
### Rate Limits
See the rate limiting guide [3] for details.
---
## Links Found:
[1] https://example.com/api-keys (internal)
[2] https://developer.example.com (external)
[3] https://example.com/docs/rate-limits (internal)
Features:
- Clean Structure - Preserves headings, paragraphs, lists, code blocks
- Link Extraction - Categorizes links as internal, external, anchor, or download
- Content Filtering - Removes navigation, ads, sidebars automatically
- Multiple Formats - Markdown or plain text output
7. wait_for_element - Smart Element Waiting
Intelligent element waiting with automatic selector strategy fallbacks.
{
url: "https://example.com",
selector: ".loading-spinner", // CSS selector with RN fallbacks
timeout: 10000
}
React Native Web Specific Tools
8. inspect_react_app - React Component Analysis
Deep inspection of React applications (works best with React Native web).
9. wait_for_react_state - React State Management
Wait for React-specific conditions like hydration, navigation, data loading.
10. execute_in_react_context - JavaScript Execution
Execute JavaScript in React context for advanced inspection.
11. check_expo_dev_server - Expo Development Tools
Check Expo/Metro bundler status for development workflows.
Selector Strategy Priority
The server uses intelligent selector strategies:
- Primary: Direct CSS selector (e.g.,
#button,.class,input[name='email']) - Fallback 1: TestID attribute (
[data-testid="button"]) - Fallback 2: Accessibility label (
[aria-label="Button"]) - Fallback 3: AccessibilityLabel (
[accessibilityLabel="Button"])
This ensures regular CSS selectors work normally while providing React Native web compatibility.
Usage Examples
Context-Free Visual Verification
// Verify data is actually displaying without burning context
{
url: "http://localhost:3000/data-table",
fullPage: true,
waitForSPA: true
}
// Returns: File path + "Table Rows: 8" ← Confirms data is populated!
Context-Free Migration Comparison
// Compare source vs target implementation efficiently
{
urlA: "http://localhost:3001/page", // Source
urlB: "http://localhost:3000/page", // Target
threshold: 0.05, // High similarity requirement
analyzeLayout: true,
analyzeColors: true
}
// Returns: File paths + "VISUAL SIMILARITY: 96.2% ✅ PASS"
Regular Website Scraping
// Works exactly like before
{
url: "https://news.ycombinator.com",
selector: ".storylink",
screenshot: false
}
Standard React App Testing
// Standard React app (Create React App, Next.js, etc.)
{
url: "http://localhost:3000",
actions: [
{ type: "click", selector: "button.login" },
{ type: "fill", selector: "#username", value: "testuser" }
]
}
React Native Web App Testing
// React Native web with enhanced features
{
url: "http://localhost:8081",
device: "iPhone 12",
waitForHydration: true,
actions: [
{ type: "tap", selector: "login-button" }, // Uses testID
{ type: "swipe", selector: "scroll-view", value: "up" }
]
}
Clean Content Extraction
// Extract clean content from documentation
{
url: "https://docs.react.dev/learn",
includeLinks: true,
format: "markdown"
}
Installation
npm install
npx playwright install
Usage with Amazon Q Developer
# Take a context-free screenshot and analyze content
q chat "Take a screenshot of localhost:3000/data-page and analyze the content"
# Compare pages efficiently without context bloat
q chat "Compare the page between localhost:3001 and localhost:3000"
# Mock data verification with minimal context usage
q chat "Verify that the data table is populated at localhost:3000"
# Works with any website
q chat "Scrape the headlines from https://news.ycombinator.com"
# Works with React apps
q chat "Test the login flow on my React app at localhost:3000"
# Enhanced React Native web support
q chat "Inspect the React Native web app at localhost:8081"
# Extract clean content for reading
q chat "Extract the main content from https://docs.react.dev/learn"
Benefits of Context-Free Design
🔥 Dramatically Reduced Context Usage
- Before: 50-200KB base64 data per screenshot
- After: Only text analysis (~1-2KB per screenshot)
- Result: 50-100x reduction in context consumption
📁 File-Based Workflow
- Screenshots saved to
/tmp/with timestamps - External tools can access images directly
- No context pollution from image data
- Structured analysis data remains in conversation
🎯 Better AI Workflows
- More screenshots possible per conversation
- Focus on analysis rather than data transfer
- Cleaner conversation history
- Faster response times
Troubleshooting
Error Handling
- Input Validation - Server validates required parameters and provides clear error messages
- Timeout Configuration - Default timeouts are optimized but can be adjusted per request
- Browser Cleanup - Automatic resource cleanup prevents memory leaks
Regular Websites
- Use standard CSS selectors (
.class,#id,tag[attribute]) - Set
mobileViewport: false(default) for desktop sites - Set
waitForHydration: false(default) for non-React sites
React Applications
- Set
waitForHydration: truefor better reliability - Use semantic selectors when possible
- Check browser console for React errors
React Native Web
- Use
testIDattributes in your components - Enable
mobileViewportor specifydevice - Set
waitForHydration: true - Use
inspect_react_appto see available elements
License
MIT
12. duckduckgo_search - Web Search Integration
Search DuckDuckGo and extract result links with titles and snippets for further content extraction.
Parameters
query(required): Search query stringmaxResults(optional): Maximum results to return (1-10, default: 5)
Example Usage
# Search for React documentation
q chat "Search DuckDuckGo for 'React hooks documentation'"
# Get more results
q chat "Search DuckDuckGo for 'Node.js best practices' with maxResults=8"
# Combine with content extraction
q chat "Search for 'AWS Lambda tutorials' then extract content from the top 2 results"
Use Cases
- Research: Find relevant URLs for content extraction
- Documentation Discovery: Locate official docs and tutorials
- Content Pipeline: Search → Extract → Analyze workflow
- Development Research: Find code examples and solutions
Integration with Other Tools
Perfect for combining with extract_content:
- Use
duckduckgo_searchto find relevant URLs - Use
extract_contenton the top results - Get comprehensive information on any topic
Why DuckDuckGo?
- No Bot Detection: More lenient than Google for automated requests
- Free & Unlimited: No API keys or rate limits required
- Privacy-Focused: Doesn't track users or requests
- Reliable Results: High-quality search results for development topics
Note: This tool extracts public search results only (completely legal). DuckDuckGo is more automation-friendly than Google, providing reliable results without anti-bot measures.
