🌐 atlas-browser-mcp
Visual web browsing for AI agents via Model Context Protocol (MCP).
✨ Features
- 📸 Visual-First: Navigate the web through screenshots, not DOM parsing
- 🏷️ Set-of-Mark: Interactive elements labeled with clickable
[0],[1],[2]... markers - 🎭 Humanized: Bezier curve mouse movements, natural typing rhythms
- 🧩 CAPTCHA-Ready: Multi-click support for image selection challenges
- 🛡️ Anti-Detection: Built-in measures to avoid bot detection
🚀 Quick Start
Installation
pip install atlas-browser-mcp
playwright install chromium
Use with Claude Desktop
Add to your Claude Desktop config (claude_desktop_config.json):
{
"mcpServers": {
"browser": {
"command": "atlas-browser-mcp"
}
}
}
Then ask Claude:
"Navigate to https://news.ycombinator.com and tell me the top 3 stories"
🛠️ Available Tools
| Tool | Description |
|---|---|
navigate | Go to URL, returns labeled screenshot |
screenshot | Capture current page with labels |
click | Click element by label ID [N] |
multi_click | Click multiple elements (for CAPTCHA) |
type | Type text, optionally press Enter |
scroll | Scroll page up or down |
📖 Usage Examples
Basic Navigation
User: Go to google.com
AI: [calls navigate(url="https://google.com")]
AI: I see the Google homepage. The search box is labeled [3].
User: Search for "MCP protocol"
AI: [calls click(label_id=3)]
AI: [calls type(text="MCP protocol", submit=true)]
AI: Here are the search results...
CAPTCHA Handling
User: Select all images with traffic lights
AI: [Looking at the CAPTCHA grid]
AI: I can see traffic lights in images [2], [5], and [8].
AI: [calls multi_click(label_ids=[2, 5, 8])]
🔧 Configuration
Headless Mode
For servers without display:
from atlas_browser_mcp.browser import VisualBrowser
browser = VisualBrowser(
headless=True, # No visible browser window
humanize=False # Faster, less human-like
)
Custom Viewport
browser = VisualBrowser()
browser.VIEWPORT = {"width": 1920, "height": 1080}
🏗️ How It Works
- Navigate: Browser loads the page
- Inject SoM: JavaScript labels all interactive elements
- Screenshot: Capture the labeled page
- AI Sees: The screenshot shows
[0],[1],[2]... on buttons, links, inputs - AI Acts: "Click
[5]" → Browser clicks the element at that position - Repeat: New screenshot with updated labels
┌─────────────────────────────────────┐
│ [0] Logo [1] Search [2] Menu │
│ │
│ [3] Article Title │
│ [4] Read More │
│ │
│ [5] Subscribe [6] Share │
└─────────────────────────────────────┘
🤝 Integration
With Cline (VS Code)
{
"mcpServers": {
"browser": {
"command": "atlas-browser-mcp"
}
}
}
Programmatic Use
from atlas_browser_mcp.browser import VisualBrowser
browser = VisualBrowser()
# Navigate
result = browser.execute("navigate", url="https://example.com")
print(f"Page title: {result.data['title']}")
print(f"Found {result.data['element_count']} interactive elements")
# Click element [0]
result = browser.execute("click", label_id=0)
# Type in focused field
result = browser.execute("type", text="Hello world", submit=True)
# Cleanup
browser.execute("close")
📋 Requirements
- Python 3.10+
- Playwright with Chromium
🐛 Troubleshooting
"Playwright not installed"
pip install playwright
playwright install chromium
"Browser closed unexpectedly"
Try running with headless=False to see what's happening:
browser = VisualBrowser(headless=False)
Elements not being detected
Some dynamic pages need more wait time. The browser waits 1.5s after navigation, but complex SPAs may need longer.
📄 License
MIT License - see LICENSE
🙏 Credits
Built for Atlas, an autonomous AI agent.
Inspired by:
-
anthropic/mcp - Model Context Protocol
-
AskUI - Visual testing approach
-
Set-of-Mark prompting - Visual grounding technique
