MySQL MCP Server
A Model Context Protocol (MCP) server for MySQL databases. This server provides a unified interface for exploring and querying MySQL databases through MCP.
Features
- MySQL Support: Connect to MySQL databases
- Unified Interface: Consistent tools and API for MySQL operations
- Database-Specific Optimizations: Uses MySQL-optimized SQL syntax
- Schema Exploration: List databases, tables, and relationships
- Query Execution: Run SQL queries with proper parameter handling
- Resource Support: MCP resource endpoints for table data
- LangGraph Text-to-SQL Agent: Intelligent agent that converts natural language to SQL queries with automatic schema exploration and error recovery
Installation
- Install dependencies:
pip install -r requirements.txt
The requirements include:
- MySQL connector for database access
- MCP and FastMCP for the server
- LangChain and LangGraph for the text-to-SQL agent
- LangChain OpenAI integration for LLM support
Usage
Connection Strings
MySQL
# Using connection string
python mysql-db-server.py --conn "mysql://user:password@host:port/database"
# Using environment variable
export DATABASE_CONNECTION_STRING="mysql://user:password@host:port/database"
python mysql-db-server.py
Command Line Options
python mysql-db-server.py [OPTIONS]
Options:
--conn TEXT MySQL connection string (format: mysql://user:password@host:port/database)
--transport TEXT Transport protocol: stdio, sse, or streamable-http (default: stdio)
--host TEXT Host to bind for SSE/HTTP transports (default: 127.0.0.1)
--port INTEGER Port to bind for SSE/HTTP transports (default: 8000)
--mount TEXT Optional mount path for SSE transport (e.g., /mcp)
--readonly Enable read-only mode (prevents INSERT, UPDATE, DELETE, etc.)
--help Show this message and exit
Environment Variables
DATABASE_CONNECTION_STRING: MySQL connection stringDATABASE_READONLY: Set to "true", "1", or "yes" to enable read-only modeDATABASE_STATEMENT_TIMEOUT_MS: Query timeout in millisecondsMCP_TRANSPORT: Transport protocol (stdio, sse, streamable-http)MCP_HOST: Host for network transportsMCP_PORT: Port for network transportsMCP_SSE_MOUNT: Mount path for SSE transport
Read-Only Mode
Read-only mode prevents any write operations (INSERT, UPDATE, DELETE, DROP, etc.) and only allows SELECT, SHOW, WITH, VALUES, and EXPLAIN queries.
Enable via command-line:
python mysql-db-server.py --conn "mysql://user:password@host:port/database" --readonly
Enable via environment variable:
export DATABASE_READONLY="true"
python mysql-db-server.py --conn "mysql://user:password@host:port/database"
Check if read-only mode is enabled:
# Get tools from MCP client
tools = await client.get_tools()
server_info_tool = next((t for t in tools if t.name == "server_info"), None)
if server_info_tool:
result = await server_info_tool.ainvoke({})
print(result["readonly"]) # True if read-only mode is enabled
LangGraph Text-to-SQL Agent
The project includes a sophisticated LangGraph-based agent that converts natural language questions into SQL queries. The agent automatically explores the database schema, generates SQL queries, executes them, and refines queries if errors occur.
Features
- Intelligent Schema Exploration:
- Automatically identifies relevant tables using LLM analysis (skips LLM call for ≤3 tables)
- Only explores tables needed for the query (much faster!)
- Describes table structures with exact column names
- Fetches foreign key relationships for better JOINs
- Caches schema information across queries (session-based)
- Query Validation:
- Validates that user queries are actually database questions before generating SQL
- Uses fast heuristics (set-based keyword matching) first, then LLM only for ambiguous cases
- Rejects gibberish, greetings, and non-database questions early
- Optimized to avoid LLM calls for 90%+ of cases
- Intelligent SQL Generation:
- Uses LLM with chain-of-thought reasoning for step-by-step query generation
- Handles multi-part questions (e.g., "find X and then find Y") with subqueries
- Window Function Support: Automatically uses window functions (ROW_NUMBER(), RANK(), DENSE_RANK(), LAG(), LEAD(), SUM() OVER, etc.) for appropriate scenarios like "top N per group", rankings within groups, comparing rows, running totals, percentiles, moving averages
- Tie Handling: Correctly handles ties in "most/least" queries by returning ALL entities with max/min value
- Validates SQL syntax and auto-fixes simple issues
- Confidence Scoring & Auto-Refinement:
- Calculates confidence scores based on actual query execution results
- Explicitly checks if SQL answers the question (not just syntax correctness)
- Automatically refines queries when confidence is low or errors detected
- Provides detailed analysis and reasoning for every query
- Error Recovery:
- Intelligently parses SQL errors to extract actionable information
- Automatically retries with improved queries when execution fails
- Preserves query structure when fixing simple errors (e.g., column names)
- Performance Optimized:
- Parallel execution of schema exploration operations
- Compiled regex patterns for faster text processing
- Reuses test query results when possible (avoids redundant executions)
- State-based SQL storage (avoids re-extraction)
- Set-based lookups for O(1) table/column checks
- LRU cache management for schema and column caches (prevents unbounded growth)
- Timeout handling for LLM and database calls (prevents hanging)
- Centralized helper functions reduce code duplication and improve maintainability
- All prompts organized in
prompts.pyfor easier updates - Well-organized code structure with clear sections (Graph Building, Helper Methods, Node Methods, Edge Methods, Public Methods)
- Robust Error Handling:
- Input validation (rejects None, empty, or invalid queries)
- Timeout protection for all LLM and database calls (validates timeout > 0)
- Graceful fallbacks when LLM returns empty/malformed responses
- Type safety for all conversions (float, int) with try/except
- Regex group validation (checks groups are not empty before use)
- Cache structure validation (handles corrupted cache gracefully)
- Safe string parsing (handles malformed table resources, error messages)
- Tool call structure validation (validates dict structure before processing)
- Comprehensive logging for debugging (optional, can be disabled)
- MCP Integration: Seamlessly uses MCP tools for database operations
Setup
- Start the MCP Server (in one terminal):
python mysql-db-server.py --conn "mysql://user:password@host:port/database" --transport streamable-http --port 8000
- Use the Agent (in Python/Jupyter):
from langchain_mcp_adapters.client import MultiServerMCPClient
from langchain_openai import ChatOpenAI
from text_to_sql_agent import TextToSQLAgent
# Connect to MCP server
client = MultiServerMCPClient({
"mysql-server": {
"url": "http://localhost:8000/mcp",
"transport": "streamable_http"
}
})
# Initialize LLM
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
# Create agent with optional configuration
agent = TextToSQLAgent(
mcp_client=client,
llm=llm,
max_query_attempts=3, # Maximum retry attempts
llm_timeout=60, # Timeout for LLM calls (seconds)
query_timeout=30, # Timeout for database queries (seconds)
max_schema_cache_size=1000, # Maximum table descriptions to cache
max_column_cache_size=500, # Maximum column name extractions to cache
enable_logging=True # Enable logging for debugging
)
llm = ChatOpenAI(
api_key="your-openai-api-key",
model="gpt-4o-mini",
temperature=0
)
# Create the agent
agent = TextToSQLAgent(
mcp_client=client,
llm=llm,
max_query_attempts=3 # Maximum retry attempts
)
Usage Examples
Basic Query
# Ask a natural language question
result = await agent.query("How many authors are in the database?")
# Get the final answer
answer = agent.get_final_answer(result)
print(answer)
Query with Filtering
# Complex queries with filters
result = await agent.query("Show me all authors born after 1950")
print(agent.get_final_answer(result))
Aggregation Queries
# Statistical queries
result = await agent.query("What is the average birth year of all authors?")
print(agent.get_final_answer(result))
How It Works
The agent follows an optimized workflow:
-
Schema Exploration (with caching and parallelization):
- Lists all available tables (cached after first call)
- Intelligently identifies relevant tables using LLM (skips for ≤3 tables)
- Describes table structures for relevant tables only (parallel execution)
- Fetches foreign key relationships for better JOINs (parallel execution)
- Caches all schema information for subsequent queries
-
Query Validation (before SQL generation):
- Validates that the user query is a valid database question
- Uses fast heuristics (keyword matching) first
- Falls back to LLM validation only for ambiguous cases
- Rejects invalid queries early to avoid unnecessary processing
-
SQL Generation (with confidence scoring):
- Uses LLM with chain-of-thought reasoning for step-by-step generation
- Includes schema, foreign keys, and column names in prompt
- Executes test query (LIMIT 3) to get sample results
- Calculates confidence score and analysis (single LLM call)
- Confidence scoring explicitly checks if SQL answers the question
- Validates SQL syntax and detects critical issues
- Sets refinement flags for edge routing (does not refine directly)
- Stores final SQL in state to avoid re-extraction
-
SQL Refinement (if needed):
- Separate node handles refinement when confidence is low or errors detected
- Fetches missing schema if needed
- Refines SQL using analysis and error context
- Re-executes test query and recalculates confidence
-
Query Execution (optimized to avoid redundancy):
- Uses stored SQL from state (avoids re-extraction)
- Reuses test query results if they contain all data (avoids redundant execution)
- Respects original LIMIT clause when reusing test results (e.g., LIMIT 1 returns 1 row, not all test results)
- Executes full query only when needed
-
Error Recovery (with intelligent parsing):
- Intelligently parses SQL errors (extracts error type, column, table)
- Passes error context to SQL generation (optional refinement)
- Preserves query structure when fixing simple errors
- Automatically retries failed queries (up to
max_query_attempts)
Agent State Graph
The agent uses a LangGraph state machine with the following nodes:
- explore_schema: Discovers and caches database schema (with conditional routing for completion)
- generate_sql: Converts natural language to SQL using LLM, calculates confidence, sets refinement flags
- refine_sql: Refines SQL based on confidence score and analysis (separate node for clarity)
- execute_query: Runs SQL queries via MCP tools (reuses test results when possible)
- refine_query: Improves queries based on error feedback (routes back to generate_sql)
- tools: Handles tool calls for schema exploration
Architecture: The agent follows LangGraph best practices with clear separation:
- Nodes: Process state and return updates (no conditional logic)
- Edges: Make routing decisions based on state flags (all orchestration logic)
For a detailed explanation of the architecture, workflow graph, and how to extend the agent, see AGENT_ARCHITECTURE.md.
Advanced Usage
Custom Configuration
# Run with custom LangGraph config
result = await agent.query(
"Find all authors with more than 5 books",
config={"recursion_limit": 50}
)
Access Full State
# Get complete agent state including all messages
result = await agent.query("Show me the database schema")
# Access messages, schema info, and query attempts
messages = result["messages"]
schema_info = result["schema_info"]
attempts = result["query_attempts"]
Helper Function for Clean Results
from langchain_core.messages import ToolMessage
def get_answer(result):
"""Extract the final answer from agent result"""
messages = result.get("messages", [])
for msg in reversed(messages):
if isinstance(msg, ToolMessage) and "successfully" in msg.content.lower():
return msg.content
return result.get("messages", [])[-1].content if result.get("messages") else "No answer"
# Use it
result = await agent.query("How many tables are in the database?")
print(get_answer(result))
Tips for Best Results
-
Be Specific: Clear, specific questions work best
- ✅ "Show me all authors born after 1950"
- ❌ "authors stuff"
-
Use Table Names: If you know table names, mention them
- ✅ "List all books in the authors table"
- ✅ "How many records are in the authors table?"
-
Specify Filters: Be explicit about filtering criteria
- ✅ "Find authors where birth_year is greater than 1950"
- ✅ "Show me authors with names starting with 'G'"
-
Ask for Aggregations: The agent handles COUNT, SUM, AVG, etc.
- ✅ "What is the average birth year?"
- ✅ "Count the total number of authors"
Limitations
- Currently optimized for SELECT queries (read-only operations)
- Maximum retry attempts are configurable (default: 3)
- Requires OpenAI API key for LLM functionality
- Schema information is cached across queries within the same agent instance for better performance
Improving the Agent
For a comprehensive guide on enhancing the agent's performance, accuracy, and features, see IMPROVEMENTS.md.
✅ Implemented Features:
- Intelligent Table Selection: Uses LLM to identify only relevant tables (skips for ≤3 tables)
- Foreign Key Relationships: Fetches FK info for better JOIN understanding
- Schema Caching: Caches schema information across queries (session-based)
- Chain-of-Thought Reasoning: Step-by-step query generation for better accuracy
- Confidence Scoring: Calculates confidence based on actual query execution results
- Auto-Refinement: Automatically improves queries when issues detected
- Intelligent Error Parsing: Extracts actionable information from error messages
- Performance Optimizations: Compiled regex, parallel execution, result reuse, state-based storage
- Code Organization:
- Centralized prompts in
prompts.py - Helper functions reduce duplication (~7.7% code reduction from 2,362 to 2,180 lines)
- Clear code structure with organized sections: Graph Building, Helper Methods, Node Methods, Edge Methods, and Public Methods
- Centralized prompts in
Planned Improvements:
- Few-Shot Examples: Add example queries to guide better SQL generation patterns
- Query Explanation: Explain what generated SQL queries do
- Query History and Learning: Learn from past successful queries
See IMPROVEMENTS.md for detailed implementation examples and step-by-step instructions.
MySQL Features
- Database-level organization
SHOW DATABASESandSHOW TABLESfor performanceDESCRIBEfor table structureSHOW CREATE TABLEfor detailed table information
Tools
The server provides the following MCP tools:
server_info: Get server and database information (database type: MySQL, readonly status, MySQL connector version)db_identity: Get current database identity details (database type: MySQL, database name, user, host, port, server version)run_query: Execute SQL queries with typed input (returns markdown or JSON string)run_query_json: Execute SQL queries with typed input (returns JSON list)list_table_resources: List tables as MCP resource URIs (table://schema/table) - returns structured listread_table_resource: Read table data (rows) via MCP resource protocol - returns JSON listlist_tables: List tables in a database - returns markdown string (human-readable)describe_table: Get table structure (columns, types, constraints) - returns markdown stringget_foreign_keys: Get foreign key relationships (via SHOW CREATE TABLE)
Calling MCP Tools
MultiServerMCPClient doesn't have a call_tool() method. Instead, you need to:
- Get tools using
get_tools()which returns LangChainStructuredToolobjects - Find the tool by name
- Invoke it using
tool.ainvoke()with the appropriate arguments
Efficient Tool Lookup
For better performance (especially with many tools), create a dictionary for O(1) lookups:
# Get all tools
tools = await client.get_tools()
# Create dictionary for fast O(1) lookup (recommended)
tool_dict = {t.name: t for t in tools}
# Now you can call tools efficiently
server_info = await tool_dict["server_info"].ainvoke({})
tables = await tool_dict["list_tables"].ainvoke({"db_schema": None})
Get Server Information
# Get tools
tools = await client.get_tools()
tool_dict = {t.name: t for t in tools}
# Get server info and connection details
result = await tool_dict["server_info"].ainvoke({})
# Returns: {
# "name": "MySQL Database Explorer",
# "database_type": "MySQL", # Database type is explicitly included
# "readonly": False,
# "mysql_connector_version": "8.0.33"
# }
db_info = await tool_dict["db_identity"].ainvoke({})
# Returns: {
# "database_type": "MySQL", # Database type is explicitly included
# "database": "mydatabase",
# "user": "root@localhost",
# "host": "localhost",
# "port": 3306,
# "server_version": "8.0.33"
# }
List Tables
tools = await client.get_tools()
tool_dict = {t.name: t for t in tools}
# Lists tables in the current database
result = await tool_dict["list_tables"].ainvoke({"db_schema": "mydatabase"})
# Or get tables as MCP resources
resources = await tool_dict["list_table_resources"].ainvoke({"schema": "mydatabase"})
Explore Table Structure
tools = await client.get_tools()
tool_dict = {t.name: t for t in tools}
# Get table structure
result = await tool_dict["describe_table"].ainvoke({
"table_name": "users",
"db_schema": "mydatabase"
})
# Get foreign key relationships
fks = await tool_dict["get_foreign_keys"].ainvoke({
"table_name": "users",
"db_schema": "mydatabase"
})
Execute Query
tools = await client.get_tools()
tool_dict = {t.name: t for t in tools}
# Execute query with markdown output
result = await tool_dict["run_query"].ainvoke({
"input": {
"sql": "SELECT * FROM users LIMIT 10",
"format": "markdown"
}
})
# Execute query with JSON output
result = await tool_dict["run_query_json"].ainvoke({
"input": {
"sql": "SELECT * FROM users LIMIT 10",
"row_limit": 100
}
})
MCP Resource Tools
These tools implement the MCP (Model Context Protocol) resource pattern, which allows tables to be treated as discoverable resources that can be accessed via standardized URIs.
list_table_resources
Lists all tables in a schema and returns them as MCP resource URIs. This enables MCP clients to discover available tables dynamically.
What it does:
- Queries the database to get all table names from the specified schema
- Formats each table as a resource URI:
table://schema/table_name - Returns a list of these URIs
Example:
# Get all tables as resource URIs
tools = await client.get_tools()
tool_dict = {t.name: t for t in tools}
resources = await tool_dict["list_table_resources"].ainvoke({"schema": "mydatabase"})
# Returns: ["table://mydatabase/users", "table://mydatabase/orders", "table://mydatabase/products"]
Use cases:
- Dynamic table discovery in MCP clients
- Building UI that lists available tables
- Integration with MCP resource-aware tools
read_table_resource
Reads data from a specific table using the MCP resource protocol. This provides a standardized way to access table data.
What it does:
- Executes
SELECT * FROM schema.tablewith a row limit - Returns table rows as a list of dictionaries (JSON format)
- Each dictionary represents one row with column names as keys
Example:
# Read table data via MCP resource protocol
tools = await client.get_tools()
tool_dict = {t.name: t for t in tools}
data = await tool_dict["read_table_resource"].ainvoke({
"schema": "mydatabase",
"table": "users",
"row_limit": 50 # Limits number of rows returned
})
# Returns: [
# {"id": 1, "name": "Alice", "email": "alice@example.com"},
# {"id": 2, "name": "Bob", "email": "bob@example.com"},
# ...
# ]
Use cases:
- Quick table previews without writing SQL
- MCP resource-aware clients that can fetch table data by URI
- Data exploration and inspection tools
Key differences from run_query:
read_table_resource: Simple, standardized way to read entire tables (no SQL needed)run_query: Flexible, allows any SQL query with custom WHERE clauses, JOINs, etc.
Comparison: Resource Tools vs Regular Tools
list_table_resources vs list_tables
Both list tables, but serve different purposes:
| Feature | list_table_resources | list_tables |
|---|---|---|
| Return Type | List[str] (structured data) | str (markdown text) |
| Format | Resource URIs: ["table://schema/users", ...] | Human-readable markdown table |
| Use Case | Programmatic access, MCP resource protocol | Human viewing, documentation |
| Integration | Works with MCP resource-aware clients | General purpose, readable output |
Example comparison:
# Get tools once
tools = await client.get_tools()
tool_dict = {t.name: t for t in tools}
# list_table_resources - structured for programs
resources = await tool_dict["list_table_resources"].ainvoke({"schema": "mydb"})
# Returns: ["table://mydb/users", "table://mydb/orders"]
# list_tables - formatted for humans
tables = await tool_dict["list_tables"].ainvoke({"db_schema": "mydb"})
# Returns: "| Tables_in_mydb |\n|-----------------|\n| users |\n| orders |"
read_table_resource vs describe_table
These serve completely different purposes - they're complementary, not redundant:
| Feature | read_table_resource | describe_table |
|---|---|---|
| What it returns | Table data (rows) | Table structure (schema) |
| Return Type | List[Dict[str, Any]] (JSON) | str (markdown) |
| SQL Command | SELECT * FROM table | DESCRIBE table |
| Use Case | View actual data/rows | View column definitions, types, constraints |
| Example Output | [{"id": 1, "name": "Alice"}, ...] | Column names, types, nullability, keys |
Example comparison:
# Get tools once
tools = await client.get_tools()
tool_dict = {t.name: t for t in tools}
# read_table_resource - get the DATA
data = await tool_dict["read_table_resource"].ainvoke({
"schema": "mydb", "table": "users", "row_limit": 10
})
# Returns: [{"id": 1, "name": "Alice", "email": "alice@example.com"}, ...]
# describe_table - get the STRUCTURE
structure = await tool_dict["describe_table"].ainvoke({
"table_name": "users", "db_schema": "mydb"
})
# Returns: "| Field | Type | Null | Key | Default | Extra |\n|-------|------|------|-----|---------|-------|\n| id | int | NO | PRI | NULL | auto_increment |\n| name | varchar(100) | NO | | NULL | |"
Summary:
- Use
read_table_resourcewhen you want to see the data in a table - Use
describe_tablewhen you want to see the schema/structure of a table - They answer different questions: "What's in the table?" vs "How is the table defined?"
Notes
- MySQL connection strings must start with
mysql:// - The format is:
mysql://user:password@host:port/database - All database names are treated as schemas for compatibility
Documentation
-
AGENT_ARCHITECTURE.md: Detailed architecture explanation, workflow graph, and extension guide
- How the agent works internally
- Visual workflow diagrams
- Step-by-step guides for adding nodes and tools
- Code examples and patterns
-
IMPROVEMENTS.md: Comprehensive improvement guide
- Enhancement ideas with priority rankings
- Implementation examples
- Performance optimizations
- Advanced features
Improving the Agent
For a comprehensive guide on enhancing the agent's performance, accuracy, and features, see IMPROVEMENTS.md.
The improvements guide includes:
- Priority rankings (High/Medium/Low) for each improvement
- Detailed code examples with implementation snippets
- Step-by-step integration instructions
- Workflow graphs showing how the agent evolves with improvements
- Testing strategies and validation approaches
- Benefits analysis for each enhancement
Quick Start: Begin with high-priority items like foreign key relationships, better error parsing, and column name suggestions for immediate impact.
Troubleshooting
Connection Issues
- Ensure the MySQL server is running and accessible
- Check connection string format and credentials
- Verify network connectivity and firewall settings
MySQL-Specific Issues
- Ensure MySQL server supports the connection protocol
- Check user permissions for the specified database
- Verify MySQL connector version compatibility
