mcp-scrcpy-vision
An MCP server that gives AI agents complete vision and control over Android devices.
Features:
- Real-time Vision: Continuous screen streaming via scrcpy H.264 + ffmpeg
- Fast Input Control: When streaming, input uses scrcpy control protocol (~5-10ms latency vs ~100-300ms with adb shell)
- UI Automation: Element detection via uiautomator with tap coordinates
- Full Input Control: Tap, swipe, long press, pinch, drag-drop, text, keycodes
- System Access: Shell commands, file transfer, clipboard, notifications
- Multi-device: Control multiple Android devices simultaneously
- WiFi ADB: Connect wirelessly for untethered automation
Quick Start
1. Prerequisites
Required:
- Node.js 18+
- ADB (Android Platform Tools) in PATH
- Android device with USB debugging enabled
For streaming (recommended for fast input):
2. Install
git clone https://github.com/anthropics/mcp-scrcpy-vision.git
cd mcp-scrcpy-vision
npm install
npm run build
3. Configure
Create .env file:
# Required for streaming + fast input
SCRCPY_SERVER_PATH="C:\scrcpy-win64-v3.2\scrcpy-server"
SCRCPY_SERVER_VERSION="3.2"
# Optional (defaults shown)
ADB_PATH="adb"
FFMPEG_PATH="ffmpeg"
DEFAULT_MAX_SIZE="1024"
DEFAULT_MAX_FPS="30"
DEFAULT_FRAME_FPS="2"
4. Add to MCP Client
Claude Desktop (%APPDATA%\Claude\claude_desktop_config.json on Windows):
{
"mcpServers": {
"android": {
"command": "node",
"args": ["C:/path/to/mcp-scrcpy-vision/dist/index.js"],
"env": {
"SCRCPY_SERVER_PATH": "C:/scrcpy/scrcpy-server",
"SCRCPY_SERVER_VERSION": "3.2"
}
}
}
}
Cursor (Settings > MCP):
{
"android": {
"command": "node",
"args": ["C:/path/to/mcp-scrcpy-vision/dist/index.js"],
"env": {
"SCRCPY_SERVER_PATH": "C:/scrcpy/scrcpy-server",
"SCRCPY_SERVER_VERSION": "3.2"
}
}
}
5. Connect Device
- Enable USB debugging on Android device (Settings > Developer Options > USB Debugging)
- Connect via USB
- Accept RSA fingerprint prompt on device
- Verify:
adb devicesshould show your device
How It Works
Two Modes of Operation
1. Snapshot Mode (No streaming required)
- Uses
android.vision.snapshotfor screenshots - Input uses ADB shell commands (~100-300ms per action)
- Works without scrcpy/ffmpeg
- Best for simple automation or when streaming isn't available
2. Streaming Mode (Recommended)
- Start with
android.vision.startStream - Continuous JPEG frames available via resource URI
- Input uses scrcpy control protocol (~5-10ms per action)
- 10-20x faster than snapshot mode
- Best for real-time control and rapid interactions
Performance Comparison
| Operation | Snapshot Mode | Streaming Mode |
|---|---|---|
| Tap | ~100-300ms | ~5-10ms |
| Swipe | ~300-500ms | ~50-100ms |
| Type text | ~50ms/char | ~5ms total |
| Screenshot | ~500ms | ~33ms (30fps) |
Tools Reference (32 tools)
Device Management
| Tool | Parameters | Description |
|---|---|---|
android.devices.list | - | List connected devices |
android.devices.info | serial | Get device info (model, SDK, etc.) |
android.adb.enableTcpip | serial, port? | Enable WiFi debugging |
android.adb.getDeviceIp | serial | Get device WiFi IP |
android.adb.connectWifi | ipAddress, port? | Connect via WiFi |
android.adb.disconnectWifi | ipAddress? | Disconnect WiFi |
Vision
| Tool | Parameters | Description |
|---|---|---|
android.vision.startStream | serial, maxSize?, maxFps?, frameFps? | Start continuous stream (enables fast input) |
android.vision.stopStream | serial | Stop stream |
android.vision.snapshot | serial | Take PNG screenshot (works without streaming) |
android.ui.dump | serial | Get UI hierarchy XML |
android.ui.findElement | serial, text?, resourceId?, className?, contentDesc? | Find elements with tap coords |
Input Control
Note: These automatically use fast scrcpy control when streaming, otherwise fall back to ADB.
| Tool | Parameters | Description |
|---|---|---|
android.input.tap | serial, x, y | Tap at coordinates |
android.input.swipe | serial, x1, y1, x2, y2, durationMs? | Swipe gesture |
android.input.longPress | serial, x, y, durationMs? | Long press |
android.input.pinch | serial, centerX, centerY, startDistance, endDistance, durationMs? | Pinch zoom |
android.input.dragDrop | serial, startX, startY, endX, endY, durationMs? | Drag and drop |
android.input.text | serial, text | Type text |
android.input.keyevent | serial, keycode | Send keycode |
App Control
| Tool | Parameters | Description |
|---|---|---|
android.app.start | serial, packageName, activity? | Launch app |
android.app.stop | serial, packageName | Force-stop app |
android.apps.list | serial, system? | List installed apps |
android.activity.current | serial | Get foreground activity |
System
| Tool | Parameters | Description |
|---|---|---|
android.shell.exec | serial, command | Execute shell command |
android.file.push | serial, localPath, remotePath | Push file to device |
android.file.pull | serial, remotePath, localPath | Pull file from device |
android.file.list | serial, path | List directory |
android.clipboard.get | serial | Get clipboard |
android.clipboard.set | serial, text | Set clipboard |
android.notifications.get | serial | Get notifications |
Screen Control
| Tool | Parameters | Description |
|---|---|---|
android.screen.wake | serial | Wake screen |
android.screen.sleep | serial | Sleep screen |
android.screen.isOn | serial | Check if screen is on |
android.screen.unlock | serial | Unlock (unsecured only) |
Resources
The server exposes these MCP resources:
android://devices- JSON list of connected devicesandroid://device/<serial>/frame/latest.jpg- Latest JPEG frame (when streaming)
Usage Examples
Basic Automation Loop (Streaming Mode)
1. Start stream: android.vision.startStream { serial: "ABC123" }
2. Read resource: android://device/ABC123/frame/latest.jpg
3. AI analyzes image, decides to tap "Login" button
4. Find element: android.ui.findElement { serial: "ABC123", text: "Login" }
5. Tap at returned coordinates: android.input.tap { serial: "ABC123", x: 540, y: 1200 }
6. Wait 500ms, read resource again, repeat
7. When done: android.vision.stopStream { serial: "ABC123" }
Simple Screenshot Mode
1. Take screenshot: android.vision.snapshot { serial: "ABC123" }
2. AI analyzes image
3. Find and tap: android.ui.findElement + android.input.tap
4. Take another screenshot to verify
WiFi Connection Workflow
1. Connect device via USB
2. android.adb.enableTcpip { serial: "ABC123" }
3. android.adb.getDeviceIp { serial: "ABC123" } → "192.168.1.50"
4. Disconnect USB cable
5. android.adb.connectWifi { ipAddress: "192.168.1.50" }
6. Now use "192.168.1.50:5555" as serial for all commands
App Testing Example
1. android.app.start { serial: "ABC123", packageName: "com.example.app" }
2. android.vision.startStream { serial: "ABC123" }
3. Wait for app to load, read frame
4. android.ui.findElement { serial: "ABC123", resourceId: "username_field" }
5. android.input.tap { serial: "ABC123", x: 540, y: 300 }
6. android.input.text { serial: "ABC123", text: "testuser@example.com" }
7. android.input.keyevent { serial: "ABC123", keycode: 66 } // Enter
8. Read frame, verify login succeeded
9. android.vision.stopStream { serial: "ABC123" }
Common Keycodes
| Key | Code | Key | Code |
|---|---|---|---|
| HOME | 3 | BACK | 4 |
| VOLUME_UP | 24 | VOLUME_DOWN | 25 |
| POWER | 26 | ENTER | 66 |
| DELETE | 67 | TAB | 61 |
| MENU | 82 | APP_SWITCH | 187 |
| WAKEUP | 224 | SLEEP | 223 |
Troubleshooting
No devices found
adb kill-server
adb start-server
adb devices
Ensure USB debugging is enabled and RSA fingerprint accepted.
Scrcpy version mismatch
SCRCPY_SERVER_VERSION must exactly match your scrcpy-server file. Check the scrcpy release version you downloaded.
ffmpeg not found
- Windows: Download from https://ffmpeg.org/download.html, extract, add bin folder to PATH
- macOS:
brew install ffmpeg - Linux:
apt install ffmpegoryum install ffmpeg
Or set FFMPEG_PATH in .env to the full path.
uiautomator dump fails
Some devices need screen on. Try android.screen.wake first.
Clipboard not working (Android 10+)
Android 10+ restricts clipboard access. Use UI automation to paste instead.
Stream won't start
- Check scrcpy-server path is correct
- Verify version numbers match
- Try running scrcpy standalone first to verify it works
Notes & Limitations
- Fast input when streaming: When a stream is active, tap/swipe/text/keyevent use the scrcpy control protocol (~5-10ms latency). Without streaming, falls back to
adb shell input(~100-300ms). - One stream per device at a time
- Snapshot works without scrcpy - useful fallback when streaming is not needed
- Clipboard has platform limitations on Android 10+
- Notifications may require permissions on newer Android
- Pinch gesture currently simulates single-finger; true multi-touch requires the streaming session
Security Warning
This MCP server provides full control over connected Android devices:
- Execute arbitrary shell commands
- Read/write files on device
- Control UI and input
- Access clipboard and notifications
Only connect devices you own and trust the AI agent.
Development
npm run dev # Development with tsx
npm run build # Compile TypeScript
npm start # Run production build
See claude.md for developer documentation. See agents.md for AI agent integration guide.
License
MIT
