🎬 Veo 3.1 MCP Server
Token-Efficient AI Video Generation with Google's Veo 3.1
🎯 What is This?
An MCP server for Google's Veo 3.1 - the state-of-the-art AI video generation model. Generate stunning videos from text prompts, reference images, or interpolate between first/last frames.
Key Features
- ✅ Text-to-Video - Generate videos from descriptions
- ✅ Reference Images - Up to 3 images for style guidance
- ✅ Frame Interpolation - First + last frame → coherent video
- ✅ Video Extension - Extend Veo-generated videos
- ✅ Batch Generation - Generate multiple videos with concurrency control
- ✅ Cost Estimation - Know costs before generating
- ✅ Token-Efficient - Auto-upload refs to Files API (97% token savings!)
🚀 Quick Start
1. Installation
cd veo-mcp
npm install
npm run build
2. Get API Key
- Go to Google AI Studio
- Create API key
- Enable Veo 3.1 in your project (billing required)
3. Configure
cp environment.template .env
# Edit .env and add your key
4. Add to Cursor
Add to ~/.cursor/mcp.json:
{
"mcpServers": {
"veo": {
"command": "node",
"args": ["C:\\Users\\woute\\Githubs\\MCP\\veo-mcp\\dist\\index.js"],
"env": {
"GEMINI_API_KEY": "your_api_key_here"
}
}
}
}
Restart Cursor. Done! ✅
🛠️ Tools
1. start_video_generation - Generate Video
Basic text-to-video:
{
"prompt": "A serene Zen garden at sunrise, cherry blossoms falling, cinematic"
}
With reference images (token-efficient!):
{
"prompt": "A futuristic cityscape at night, neon lights",
"referenceImages": [{
"source": "url",
"url": "https://example.com/style.jpg"
}],
"durationSeconds": 8,
"resolution": "1080p"
}
First/last frame interpolation:
{
"prompt": "Smooth transition between these scenes",
"firstFrame": {
"source": "file_path",
"filePath": "C:\\first.jpg"
},
"lastFrame": {
"source": "file_path",
"filePath": "C:\\last.jpg"
}
}
Parameters:
model-veo-3.1-generate-001(quality) orveo-3.1-fast-generate-001(speed)durationSeconds- 4, 6, or 8aspectRatio-16:9or9:16resolution-720por1080pgenerateAudio- Include synchronized audio (2x cost)seed- For reproducibilitysampleCount- Generate 1-4 videos
2. get_video_job - Check Status
{
"operationName": "operations/xyz"
}
Returns status and video URLs when complete.
3. upload_image - Pre-Upload References
{
"source": "file_path",
"filePath": "C:\\style-ref.jpg"
}
Returns fileUri valid for 48 hours. Reuse across multiple generations!
4. extend_video - Extend Videos
{
"videoFileUri": "files/abc123",
"additionalSeconds": 7,
"prompt": "Continue with the character walking into the sunset"
}
5. start_batch_video_generation - Batch Generate
{
"jobs": [
{"key": "scene1", "request": {"prompt": "..."}},
{"key": "scene2", "request": {"prompt": "..."}}
],
"concurrency": 3
}
6. estimate_veo_cost - Cost Estimation
{
"model": "veo-3.1-fast-generate-001",
"durationSeconds": 8,
"sampleCount": 1,
"generateAudio": false
}
Returns estimated cost in USD.
💰 Pricing
| Model | Video Only | Video + Audio |
|---|---|---|
| veo-3.1-generate-001 (quality) | $0.20/sec | $0.40/sec |
| veo-3.1-fast-generate-001 (speed) | $0.10/sec | $0.15/sec |
Example Costs:
- 8s video (fast, no audio): $0.80
- 8s video (quality, with audio): $3.20
- 4s video (fast, no audio): $0.40
📊 Limits & Constraints
| Parameter | Limit |
|---|---|
| Duration | 4, 6, or 8 seconds |
| Reference images | 0-3 images |
| Sample count | 1-4 videos |
| Resolutions | 720p, 1080p |
| Aspect ratios | 16:9, 9:16 |
| Rate limit | ~50 requests/min |
💡 Usage Examples
Simple Text-to-Video
Generate an 8-second video of a peaceful forest scene with morning mist
With Style Reference
Create a video of a tech startup office, using this image for style: C:\ref.jpg
Frame Interpolation
Generate a smooth transition between first.jpg and last.jpg, 8 seconds, cinematic camera movement
Batch Generation
Generate 5 different video variations of a product showcase with different angles
🔍 How Token Efficiency Works
❌ Naive Approach (Base64)
{
"referenceImages": [{
"base64": "iVBORw0KGgo..." // 500KB → ~50,000 tokens!
}]
}
Cost: Massive token usage per call
✅ Token-Efficient (This MCP)
{
"referenceImages": [{
"source": "url",
"url": "https://example.com/ref.jpg" // ~20 tokens
}]
}
What Happens:
- Server downloads image (no tokens)
- Computes SHA-256 hash
- Checks cache (48h validity)
- Uploads to Files API if needed (~1s)
- Uses short
files/abc123URI (~5 tokens)
Savings: 97%+ fewer tokens! 🎉
⏱️ Generation Times
| Configuration | Typical Time |
|---|---|
| 4s, 720p, no audio | 30-60 sec |
| 8s, 1080p, no audio | 60-120 sec |
| 8s, 1080p, with audio | 90-150 sec |
| With references | +10-30 sec |
| Frame interpolation | +20-40 sec |
Note: Times vary based on prompt complexity and server load.
🎨 Best Practices
1. Start Small, Scale Up
Step 1: Generate 1 video at 720p
Step 2: If good, regenerate at 1080p
Step 3: Use batch for variations
2. Use Fast Model for Testing
{
"model": "veo-3.1-fast-generate-001", // Testing
"resolution": "720p"
}
Switch to quality model for final:
{
"model": "veo-3.1-generate-001", // Final
"resolution": "1080p"
}
3. Pre-Upload Frequently Used References
// Step 1: Upload once
upload_image {"source": "file_path", "filePath": "brand-style.jpg"}
// Returns: files/xyz123
// Step 2: Reuse many times
{
"referenceImages": [{"source": "file_uri", "fileUri": "files/xyz123"}]
}
4. Leverage Batch for Variations
{
"jobs": [
{"key": "v1", "request": {"prompt": "Scene 1...", "seed": 1}},
{"key": "v2", "request": {"prompt": "Scene 1...", "seed": 2}},
{"key": "v3", "request": {"prompt": "Scene 1...", "seed": 3}}
]
}
5. Monitor Costs
Always estimate before large batches:
estimate_veo_cost {
"model": "veo-3.1-fast-generate-001",
"durationSeconds": 8,
"sampleCount": 10
}
// Returns: $8.00 estimate
🎬 Async Operation Flow
Veo uses async long-running operations:
1. start_video_generation
↓ Returns operationName immediately
2. get_video_job (poll every 10-30s)
↓ Returns {done: false, status: "RUNNING"}
3. get_video_job (after 60-120s)
↓ Returns {done: true, videos: [{videoUri: "..."}]}
4. Download video from videoUri
Tip: Don't poll too frequently (< 10s intervals).
🆘 Troubleshooting
"API not enabled" (403)
- Go to Google Cloud Console
- Enable "Generative Language API"
- Enable billing
- Wait 5-10 minutes for propagation
"Rate limit exceeded"
- Veo allows ~50 requests/min
- Use batch tool with
concurrency: 3 - Add delays between requests
"Invalid aspect ratio with references"
- 9:16 may not work with reference images
- Use 16:9 for reference mode
- Check Veo 3.1 docs for updates
"Video extension failed"
- Only Veo-generated videos can be extended
- Cannot extend arbitrary MP4s
- Input must be from previous Veo job
Long generation times
- 1080p takes longer than 720p
- Audio generation adds time
- Reference images add processing
- Frame interpolation is slowest
📚 Resources
🎯 Status: Production Ready ✅
- ✅ All 6 tools implemented
- ✅ Token-efficient file handling
- ✅ Async operation support
- ✅ Batch generation with concurrency control
- ✅ Cost estimation
- ✅ Comprehensive validation
- ✅ Error handling
- ✅ Full documentation
Ready to generate amazing videos! 🚀
Built with 🎬 for AI video generation
