MySQL MCP Server

A Model Context Protocol (MCP) server for MySQL databases. This server provides a unified interface for exploring and querying MySQL databases through MCP.

Features

MySQL Support: Connect to MySQL databases
Unified Interface: Consistent tools and API for MySQL operations
Database-Specific Optimizations: Uses MySQL-optimized SQL syntax
Schema Exploration: List databases, tables, and relationships
Query Execution: Run SQL queries with proper parameter handling
Resource Support: MCP resource endpoints for table data
LangGraph Text-to-SQL Agent: Intelligent agent that converts natural language to SQL queries with automatic schema exploration and error recovery

Installation

Install dependencies:

pip install -r requirements.txt

The requirements include:

MySQL connector for database access
MCP and FastMCP for the server
LangChain and LangGraph for the text-to-SQL agent
LangChain OpenAI integration for LLM support

Usage

Connection Strings

MySQL

# Using connection string
python mysql-db-server.py --conn "mysql://user:password@host:port/database"

# Using environment variable
export DATABASE_CONNECTION_STRING="mysql://user:password@host:port/database"
python mysql-db-server.py

Command Line Options

python mysql-db-server.py [OPTIONS]

Options:
  --conn TEXT           MySQL connection string (format: mysql://user:password@host:port/database)
  --transport TEXT      Transport protocol: stdio, sse, or streamable-http (default: stdio)
  --host TEXT           Host to bind for SSE/HTTP transports (default: 127.0.0.1)
  --port INTEGER        Port to bind for SSE/HTTP transports (default: 8000)
  --mount TEXT          Optional mount path for SSE transport (e.g., /mcp)
  --readonly            Enable read-only mode (prevents INSERT, UPDATE, DELETE, etc.)
  --help                Show this message and exit

Environment Variables

DATABASE_CONNECTION_STRING: MySQL connection string
DATABASE_READONLY: Set to "true", "1", or "yes" to enable read-only mode
DATABASE_STATEMENT_TIMEOUT_MS: Query timeout in milliseconds
MCP_TRANSPORT: Transport protocol (stdio, sse, streamable-http)
MCP_HOST: Host for network transports
MCP_PORT: Port for network transports
MCP_SSE_MOUNT: Mount path for SSE transport

Read-Only Mode

Read-only mode prevents any write operations (INSERT, UPDATE, DELETE, DROP, etc.) and only allows SELECT, SHOW, WITH, VALUES, and EXPLAIN queries.

Enable via command-line:

python mysql-db-server.py --conn "mysql://user:password@host:port/database" --readonly

Enable via environment variable:

export DATABASE_READONLY="true"
python mysql-db-server.py --conn "mysql://user:password@host:port/database"

Check if read-only mode is enabled:

# Get tools from MCP client
tools = await client.get_tools()
server_info_tool = next((t for t in tools if t.name == "server_info"), None)

if server_info_tool:
    result = await server_info_tool.ainvoke({})
    print(result["readonly"])  # True if read-only mode is enabled

LangGraph Text-to-SQL Agent

The project includes a sophisticated LangGraph-based agent that converts natural language questions into SQL queries. The agent automatically explores the database schema, generates SQL queries, executes them, and refines queries if errors occur.

Features

Intelligent Schema Exploration:
- Automatically identifies relevant tables using LLM analysis (skips LLM call for ≤3 tables)
- Only explores tables needed for the query (much faster!)
- Describes table structures with exact column names
- Fetches foreign key relationships for better JOINs
- Caches schema information across queries (session-based)
Query Validation:
- Validates that user queries are actually database questions before generating SQL
- Uses fast heuristics (set-based keyword matching) first, then LLM only for ambiguous cases
- Rejects gibberish, greetings, and non-database questions early
- Optimized to avoid LLM calls for 90%+ of cases
Intelligent SQL Generation:
- Uses LLM with chain-of-thought reasoning for step-by-step query generation
- Handles multi-part questions (e.g., "find X and then find Y") with subqueries
- Window Function Support: Automatically uses window functions (ROW_NUMBER(), RANK(), DENSE_RANK(), LAG(), LEAD(), SUM() OVER, etc.) for appropriate scenarios like "top N per group", rankings within groups, comparing rows, running totals, percentiles, moving averages
- Tie Handling: Correctly handles ties in "most/least" queries by returning ALL entities with max/min value
- Validates SQL syntax and auto-fixes simple issues
Confidence Scoring & Auto-Refinement:
- Calculates confidence scores based on actual query execution results
- Explicitly checks if SQL answers the question (not just syntax correctness)
- Automatically refines queries when confidence is low or errors detected
- Provides detailed analysis and reasoning for every query
Error Recovery:
- Intelligently parses SQL errors to extract actionable information
- Automatically retries with improved queries when execution fails
- Preserves query structure when fixing simple errors (e.g., column names)
Performance Optimized:
- Parallel execution of schema exploration operations
- Compiled regex patterns for faster text processing
- Reuses test query results when possible (avoids redundant executions)
- State-based SQL storage (avoids re-extraction)
- Set-based lookups for O(1) table/column checks
- LRU cache management for schema and column caches (prevents unbounded growth)
- Timeout handling for LLM and database calls (prevents hanging)
- Centralized helper functions reduce code duplication and improve maintainability
- All prompts organized in prompts.py for easier updates
- Well-organized code structure with clear sections (Graph Building, Helper Methods, Node Methods, Edge Methods, Public Methods)
Robust Error Handling:
- Input validation (rejects None, empty, or invalid queries)
- Timeout protection for all LLM and database calls (validates timeout > 0)
- Graceful fallbacks when LLM returns empty/malformed responses
- Type safety for all conversions (float, int) with try/except
- Regex group validation (checks groups are not empty before use)
- Cache structure validation (handles corrupted cache gracefully)
- Safe string parsing (handles malformed table resources, error messages)
- Tool call structure validation (validates dict structure before processing)
- Comprehensive logging for debugging (optional, can be disabled)
MCP Integration: Seamlessly uses MCP tools for database operations

Setup

Start the MCP Server (in one terminal):

python mysql-db-server.py --conn "mysql://user:password@host:port/database" --transport streamable-http --port 8000

Use the Agent (in Python/Jupyter):

from langchain_mcp_adapters.client import MultiServerMCPClient
from langchain_openai import ChatOpenAI
from text_to_sql_agent import TextToSQLAgent

# Connect to MCP server
client = MultiServerMCPClient({
    "mysql-server": {
        "url": "http://localhost:8000/mcp",
        "transport": "streamable_http"
    }
})

# Initialize LLM
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Create agent with optional configuration
agent = TextToSQLAgent(
    mcp_client=client,
    llm=llm,
    max_query_attempts=3,  # Maximum retry attempts
    llm_timeout=60,  # Timeout for LLM calls (seconds)
    query_timeout=30,  # Timeout for database queries (seconds)
    max_schema_cache_size=1000,  # Maximum table descriptions to cache
    max_column_cache_size=500,  # Maximum column name extractions to cache
    enable_logging=True  # Enable logging for debugging
)
llm = ChatOpenAI(
    api_key="your-openai-api-key",
    model="gpt-4o-mini",
    temperature=0
)

# Create the agent
agent = TextToSQLAgent(
    mcp_client=client,
    llm=llm,
    max_query_attempts=3  # Maximum retry attempts
)

Usage Examples

Basic Query

# Ask a natural language question
result = await agent.query("How many authors are in the database?")

# Get the final answer
answer = agent.get_final_answer(result)
print(answer)

Query with Filtering

# Complex queries with filters
result = await agent.query("Show me all authors born after 1950")
print(agent.get_final_answer(result))

Aggregation Queries

# Statistical queries
result = await agent.query("What is the average birth year of all authors?")
print(agent.get_final_answer(result))

How It Works

The agent follows an optimized workflow:

Schema Exploration (with caching and parallelization):
- Lists all available tables (cached after first call)
- Intelligently identifies relevant tables using LLM (skips for ≤3 tables)
- Describes table structures for relevant tables only (parallel execution)
- Fetches foreign key relationships for better JOINs (parallel execution)
- Caches all schema information for subsequent queries
Query Validation (before SQL generation):
- Validates that the user query is a valid database question
- Uses fast heuristics (keyword matching) first
- Falls back to LLM validation only for ambiguous cases
- Rejects invalid queries early to avoid unnecessary processing
SQL Generation (with confidence scoring):
- Uses LLM with chain-of-thought reasoning for step-by-step generation
- Includes schema, foreign keys, and column names in prompt
- Executes test query (LIMIT 3) to get sample results
- Calculates confidence score and analysis (single LLM call)
- Confidence scoring explicitly checks if SQL answers the question
- Validates SQL syntax and detects critical issues
- Sets refinement flags for edge routing (does not refine directly)
- Stores final SQL in state to avoid re-extraction
SQL Refinement (if needed):
- Separate node handles refinement when confidence is low or errors detected
- Fetches missing schema if needed
- Refines SQL using analysis and error context
- Re-executes test query and recalculates confidence
Query Execution (optimized to avoid redundancy):
- Uses stored SQL from state (avoids re-extraction)
- Reuses test query results if they contain all data (avoids redundant execution)
- Respects original LIMIT clause when reusing test results (e.g., LIMIT 1 returns 1 row, not all test results)
- Executes full query only when needed
Error Recovery (with intelligent parsing):
- Intelligently parses SQL errors (extracts error type, column, table)
- Passes error context to SQL generation (optional refinement)
- Preserves query structure when fixing simple errors
- Automatically retries failed queries (up to max_query_attempts)

Agent State Graph

The agent uses a LangGraph state machine with the following nodes:

explore_schema: Discovers and caches database schema (with conditional routing for completion)
generate_sql: Converts natural language to SQL using LLM, calculates confidence, sets refinement flags
refine_sql: Refines SQL based on confidence score and analysis (separate node for clarity)
execute_query: Runs SQL queries via MCP tools (reuses test results when possible)
refine_query: Improves queries based on error feedback (routes back to generate_sql)
tools: Handles tool calls for schema exploration

Architecture: The agent follows LangGraph best practices with clear separation:

Nodes: Process state and return updates (no conditional logic)
Edges: Make routing decisions based on state flags (all orchestration logic)

For a detailed explanation of the architecture, workflow graph, and how to extend the agent, see AGENT_ARCHITECTURE.md.

Advanced Usage

Custom Configuration

# Run with custom LangGraph config
result = await agent.query(
    "Find all authors with more than 5 books",
    config={"recursion_limit": 50}
)

Access Full State

# Get complete agent state including all messages
result = await agent.query("Show me the database schema")

# Access messages, schema info, and query attempts
messages = result["messages"]
schema_info = result["schema_info"]
attempts = result["query_attempts"]

Helper Function for Clean Results

from langchain_core.messages import ToolMessage

def get_answer(result):
    """Extract the final answer from agent result"""
    messages = result.get("messages", [])
    for msg in reversed(messages):
        if isinstance(msg, ToolMessage) and "successfully" in msg.content.lower():
            return msg.content
    return result.get("messages", [])[-1].content if result.get("messages") else "No answer"

# Use it
result = await agent.query("How many tables are in the database?")
print(get_answer(result))

Tips for Best Results

Be Specific: Clear, specific questions work best
- ✅ "Show me all authors born after 1950"
- ❌ "authors stuff"
Use Table Names: If you know table names, mention them
- ✅ "List all books in the authors table"
- ✅ "How many records are in the authors table?"
Specify Filters: Be explicit about filtering criteria
- ✅ "Find authors where birth_year is greater than 1950"
- ✅ "Show me authors with names starting with 'G'"
Ask for Aggregations: The agent handles COUNT, SUM, AVG, etc.
- ✅ "What is the average birth year?"
- ✅ "Count the total number of authors"

Limitations

Currently optimized for SELECT queries (read-only operations)
Maximum retry attempts are configurable (default: 3)
Requires OpenAI API key for LLM functionality
Schema information is cached across queries within the same agent instance for better performance

Improving the Agent

For a comprehensive guide on enhancing the agent's performance, accuracy, and features, see IMPROVEMENTS.md.

✅ Implemented Features:

Intelligent Table Selection: Uses LLM to identify only relevant tables (skips for ≤3 tables)
Foreign Key Relationships: Fetches FK info for better JOIN understanding
Schema Caching: Caches schema information across queries (session-based)
Chain-of-Thought Reasoning: Step-by-step query generation for better accuracy
Confidence Scoring: Calculates confidence based on actual query execution results
Auto-Refinement: Automatically improves queries when issues detected
Intelligent Error Parsing: Extracts actionable information from error messages
Performance Optimizations: Compiled regex, parallel execution, result reuse, state-based storage
Code Organization:
- Centralized prompts in prompts.py
- Helper functions reduce duplication (~7.7% code reduction from 2,362 to 2,180 lines)
- Clear code structure with organized sections: Graph Building, Helper Methods, Node Methods, Edge Methods, and Public Methods

Planned Improvements:

Few-Shot Examples: Add example queries to guide better SQL generation patterns
Query Explanation: Explain what generated SQL queries do
Query History and Learning: Learn from past successful queries

See IMPROVEMENTS.md for detailed implementation examples and step-by-step instructions.

MySQL Features

Database-level organization
SHOW DATABASES and SHOW TABLES for performance
DESCRIBE for table structure
SHOW CREATE TABLE for detailed table information

Tools

The server provides the following MCP tools:

server_info: Get server and database information (database type: MySQL, readonly status, MySQL connector version)
db_identity: Get current database identity details (database type: MySQL, database name, user, host, port, server version)
run_query: Execute SQL queries with typed input (returns markdown or JSON string)
run_query_json: Execute SQL queries with typed input (returns JSON list)
list_table_resources: List tables as MCP resource URIs (table://schema/table) - returns structured list
read_table_resource: Read table data (rows) via MCP resource protocol - returns JSON list
list_tables: List tables in a database - returns markdown string (human-readable)
describe_table: Get table structure (columns, types, constraints) - returns markdown string
get_foreign_keys: Get foreign key relationships (via SHOW CREATE TABLE)

Calling MCP Tools

MultiServerMCPClient doesn't have a call_tool() method. Instead, you need to:

Get tools using get_tools() which returns LangChain StructuredTool objects
Find the tool by name
Invoke it using tool.ainvoke() with the appropriate arguments

Efficient Tool Lookup

For better performance (especially with many tools), create a dictionary for O(1) lookups:

# Get all tools
tools = await client.get_tools()

# Create dictionary for fast O(1) lookup (recommended)
tool_dict = {t.name: t for t in tools}

# Now you can call tools efficiently
server_info = await tool_dict["server_info"].ainvoke({})
tables = await tool_dict["list_tables"].ainvoke({"db_schema": None})

Get Server Information

# Get tools
tools = await client.get_tools()
tool_dict = {t.name: t for t in tools}

# Get server info and connection details
result = await tool_dict["server_info"].ainvoke({})
# Returns: {
#   "name": "MySQL Database Explorer",
#   "database_type": "MySQL",  # Database type is explicitly included
#   "readonly": False,
#   "mysql_connector_version": "8.0.33"
# }

db_info = await tool_dict["db_identity"].ainvoke({})
# Returns: {
#   "database_type": "MySQL",  # Database type is explicitly included
#   "database": "mydatabase",
#   "user": "root@localhost",
#   "host": "localhost",
#   "port": 3306,
#   "server_version": "8.0.33"
# }

List Tables

tools = await client.get_tools()
tool_dict = {t.name: t for t in tools}

# Lists tables in the current database
result = await tool_dict["list_tables"].ainvoke({"db_schema": "mydatabase"})

# Or get tables as MCP resources
resources = await tool_dict["list_table_resources"].ainvoke({"schema": "mydatabase"})

Explore Table Structure

tools = await client.get_tools()
tool_dict = {t.name: t for t in tools}

# Get table structure
result = await tool_dict["describe_table"].ainvoke({
    "table_name": "users",
    "db_schema": "mydatabase"
})

# Get foreign key relationships
fks = await tool_dict["get_foreign_keys"].ainvoke({
    "table_name": "users",
    "db_schema": "mydatabase"
})

Execute Query

tools = await client.get_tools()
tool_dict = {t.name: t for t in tools}

# Execute query with markdown output
result = await tool_dict["run_query"].ainvoke({
    "input": {
        "sql": "SELECT * FROM users LIMIT 10",
        "format": "markdown"
    }
})

# Execute query with JSON output
result = await tool_dict["run_query_json"].ainvoke({
    "input": {
        "sql": "SELECT * FROM users LIMIT 10",
        "row_limit": 100
    }
})

MCP Resource Tools

These tools implement the MCP (Model Context Protocol) resource pattern, which allows tables to be treated as discoverable resources that can be accessed via standardized URIs.

`list_table_resources`

Lists all tables in a schema and returns them as MCP resource URIs. This enables MCP clients to discover available tables dynamically.

What it does:

Queries the database to get all table names from the specified schema
Formats each table as a resource URI: table://schema/table_name
Returns a list of these URIs

Example:

# Get all tables as resource URIs
tools = await client.get_tools()
tool_dict = {t.name: t for t in tools}
resources = await tool_dict["list_table_resources"].ainvoke({"schema": "mydatabase"})
# Returns: ["table://mydatabase/users", "table://mydatabase/orders", "table://mydatabase/products"]

Use cases:

Dynamic table discovery in MCP clients
Building UI that lists available tables
Integration with MCP resource-aware tools

`read_table_resource`

Reads data from a specific table using the MCP resource protocol. This provides a standardized way to access table data.

What it does:

Executes SELECT * FROM schema.table with a row limit
Returns table rows as a list of dictionaries (JSON format)
Each dictionary represents one row with column names as keys

Example:

# Read table data via MCP resource protocol
tools = await client.get_tools()
tool_dict = {t.name: t for t in tools}
data = await tool_dict["read_table_resource"].ainvoke({
    "schema": "mydatabase",
    "table": "users",
    "row_limit": 50  # Limits number of rows returned
})
# Returns: [
#   {"id": 1, "name": "Alice", "email": "alice@example.com"},
#   {"id": 2, "name": "Bob", "email": "bob@example.com"},
#   ...
# ]

Use cases:

Quick table previews without writing SQL
MCP resource-aware clients that can fetch table data by URI
Data exploration and inspection tools

Key differences from run_query:

read_table_resource: Simple, standardized way to read entire tables (no SQL needed)
run_query: Flexible, allows any SQL query with custom WHERE clauses, JOINs, etc.

Comparison: Resource Tools vs Regular Tools

`list_table_resources` vs `list_tables`

Both list tables, but serve different purposes:

Feature	`list_table_resources`	`list_tables`
Return Type	`List[str]` (structured data)	`str` (markdown text)
Format	Resource URIs: `["table://schema/users", ...]`	Human-readable markdown table
Use Case	Programmatic access, MCP resource protocol	Human viewing, documentation
Integration	Works with MCP resource-aware clients	General purpose, readable output

Example comparison:

# Get tools once
tools = await client.get_tools()
tool_dict = {t.name: t for t in tools}

# list_table_resources - structured for programs
resources = await tool_dict["list_table_resources"].ainvoke({"schema": "mydb"})
# Returns: ["table://mydb/users", "table://mydb/orders"]

# list_tables - formatted for humans
tables = await tool_dict["list_tables"].ainvoke({"db_schema": "mydb"})
# Returns: "| Tables_in_mydb |\n|-----------------|\n| users          |\n| orders         |"

`read_table_resource` vs `describe_table`

These serve completely different purposes - they're complementary, not redundant:

Feature	`read_table_resource`	`describe_table`
What it returns	Table data (rows)	Table structure (schema)
Return Type	`List[Dict[str, Any]]` (JSON)	`str` (markdown)
SQL Command	`SELECT * FROM table`	`DESCRIBE table`
Use Case	View actual data/rows	View column definitions, types, constraints
Example Output	`[{"id": 1, "name": "Alice"}, ...]`	Column names, types, nullability, keys

Example comparison:

# Get tools once
tools = await client.get_tools()
tool_dict = {t.name: t for t in tools}

# read_table_resource - get the DATA
data = await tool_dict["read_table_resource"].ainvoke({
    "schema": "mydb", "table": "users", "row_limit": 10
})
# Returns: [{"id": 1, "name": "Alice", "email": "alice@example.com"}, ...]

# describe_table - get the STRUCTURE
structure = await tool_dict["describe_table"].ainvoke({
    "table_name": "users", "db_schema": "mydb"
})
# Returns: "| Field | Type | Null | Key | Default | Extra |\n|-------|------|------|-----|---------|-------|\n| id | int | NO | PRI | NULL | auto_increment |\n| name | varchar(100) | NO | | NULL | |"

Summary:

Use read_table_resource when you want to see the data in a table
Use describe_table when you want to see the schema/structure of a table
They answer different questions: "What's in the table?" vs "How is the table defined?"

Notes

MySQL connection strings must start with mysql://
The format is: mysql://user:password@host:port/database
All database names are treated as schemas for compatibility

Documentation

AGENT_ARCHITECTURE.md: Detailed architecture explanation, workflow graph, and extension guide
- How the agent works internally
- Visual workflow diagrams
- Step-by-step guides for adding nodes and tools
- Code examples and patterns
IMPROVEMENTS.md: Comprehensive improvement guide
- Enhancement ideas with priority rankings
- Implementation examples
- Performance optimizations
- Advanced features

Improving the Agent

For a comprehensive guide on enhancing the agent's performance, accuracy, and features, see IMPROVEMENTS.md.

The improvements guide includes:

Priority rankings (High/Medium/Low) for each improvement
Detailed code examples with implementation snippets
Step-by-step integration instructions
Workflow graphs showing how the agent evolves with improvements
Testing strategies and validation approaches
Benefits analysis for each enhancement

Quick Start: Begin with high-priority items like foreign key relationships, better error parsing, and column name suggestions for immediate impact.

Troubleshooting

Connection Issues

Ensure the MySQL server is running and accessible
Check connection string format and credentials
Verify network connectivity and firewall settings

MySQL-Specific Issues

Ensure MySQL server supports the connection protocol
Check user permissions for the specified database
Verify MySQL connector version compatibility

MySQL MCP Server

A Model Context Protocol (MCP) server for MySQL databases. This server provides a unified interface for exploring and querying MySQL databases through MCP.

Features

MySQL Support: Connect to MySQL databases
Unified Interface: Consistent tools and API for MySQL operations
Database-Specific Optimizations: Uses MySQL-optimized SQL syntax
Schema Exploration: List databases, tables, and relationships
Query Execution: Run SQL queries with proper parameter handling
Resource Support: MCP resource endpoints for table data
LangGraph Text-to-SQL Agent: Intelligent agent that converts natural language to SQL queries with automatic schema exploration and error recovery

Installation

Install dependencies:

pip install -r requirements.txt

The requirements include:

MySQL connector for database access
MCP and FastMCP for the server
LangChain and LangGraph for the text-to-SQL agent
LangChain OpenAI integration for LLM support

Usage

Connection Strings

MySQL

# Using connection string
python mysql-db-server.py --conn "mysql://user:password@host:port/database"

# Using environment variable
export DATABASE_CONNECTION_STRING="mysql://user:password@host:port/database"
python mysql-db-server.py

Command Line Options

python mysql-db-server.py [OPTIONS]

Options:
  --conn TEXT           MySQL connection string (format: mysql://user:password@host:port/database)
  --transport TEXT      Transport protocol: stdio, sse, or streamable-http (default: stdio)
  --host TEXT           Host to bind for SSE/HTTP transports (default: 127.0.0.1)
  --port INTEGER        Port to bind for SSE/HTTP transports (default: 8000)
  --mount TEXT          Optional mount path for SSE transport (e.g., /mcp)
  --readonly            Enable read-only mode (prevents INSERT, UPDATE, DELETE, etc.)
  --help                Show this message and exit

Environment Variables

DATABASE_CONNECTION_STRING: MySQL connection string
DATABASE_READONLY: Set to "true", "1", or "yes" to enable read-only mode
DATABASE_STATEMENT_TIMEOUT_MS: Query timeout in milliseconds
MCP_TRANSPORT: Transport protocol (stdio, sse, streamable-http)
MCP_HOST: Host for network transports
MCP_PORT: Port for network transports
MCP_SSE_MOUNT: Mount path for SSE transport

Read-Only Mode

Read-only mode prevents any write operations (INSERT, UPDATE, DELETE, DROP, etc.) and only allows SELECT, SHOW, WITH, VALUES, and EXPLAIN queries.

Enable via command-line:

python mysql-db-server.py --conn "mysql://user:password@host:port/database" --readonly

Enable via environment variable:

export DATABASE_READONLY="true"
python mysql-db-server.py --conn "mysql://user:password@host:port/database"

Check if read-only mode is enabled:

# Get tools from MCP client
tools = await client.get_tools()
server_info_tool = next((t for t in tools if t.name == "server_info"), None)

if server_info_tool:
    result = await server_info_tool.ainvoke({})
    print(result["readonly"])  # True if read-only mode is enabled

LangGraph Text-to-SQL Agent

Features

Intelligent Schema Exploration:
- Automatically identifies relevant tables using LLM analysis (skips LLM call for ≤3 tables)
- Only explores tables needed for the query (much faster!)
- Describes table structures with exact column names
- Fetches foreign key relationships for better JOINs
- Caches schema information across queries (session-based)
Query Validation:
- Validates that user queries are actually database questions before generating SQL
- Uses fast heuristics (set-based keyword matching) first, then LLM only for ambiguous cases
- Rejects gibberish, greetings, and non-database questions early
- Optimized to avoid LLM calls for 90%+ of cases
Intelligent SQL Generation:
- Uses LLM with chain-of-thought reasoning for step-by-step query generation
- Handles multi-part questions (e.g., "find X and then find Y") with subqueries
- Window Function Support: Automatically uses window functions (ROW_NUMBER(), RANK(), DENSE_RANK(), LAG(), LEAD(), SUM() OVER, etc.) for appropriate scenarios like "top N per group", rankings within groups, comparing rows, running totals, percentiles, moving averages
- Tie Handling: Correctly handles ties in "most/least" queries by returning ALL entities with max/min value
- Validates SQL syntax and auto-fixes simple issues
Confidence Scoring & Auto-Refinement:
- Calculates confidence scores based on actual query execution results
- Explicitly checks if SQL answers the question (not just syntax correctness)
- Automatically refines queries when confidence is low or errors detected
- Provides detailed analysis and reasoning for every query
Error Recovery:
- Intelligently parses SQL errors to extract actionable information
- Automatically retries with improved queries when execution fails
- Preserves query structure when fixing simple errors (e.g., column names)
Performance Optimized:
- Parallel execution of schema exploration operations
- Compiled regex patterns for faster text processing
- Reuses test query results when possible (avoids redundant executions)
- State-based SQL storage (avoids re-extraction)
- Set-based lookups for O(1) table/column checks
- LRU cache management for schema and column caches (prevents unbounded growth)
- Timeout handling for LLM and database calls (prevents hanging)
- Centralized helper functions reduce code duplication and improve maintainability
- All prompts organized in prompts.py for easier updates
- Well-organized code structure with clear sections (Graph Building, Helper Methods, Node Methods, Edge Methods, Public Methods)
Robust Error Handling:
- Input validation (rejects None, empty, or invalid queries)
- Timeout protection for all LLM and database calls (validates timeout > 0)
- Graceful fallbacks when LLM returns empty/malformed responses
- Type safety for all conversions (float, int) with try/except
- Regex group validation (checks groups are not empty before use)
- Cache structure validation (handles corrupted cache gracefully)
- Safe string parsing (handles malformed table resources, error messages)
- Tool call structure validation (validates dict structure before processing)
- Comprehensive logging for debugging (optional, can be disabled)
MCP Integration: Seamlessly uses MCP tools for database operations

Setup

Start the MCP Server (in one terminal):

python mysql-db-server.py --conn "mysql://user:password@host:port/database" --transport streamable-http --port 8000

Use the Agent (in Python/Jupyter):

from langchain_mcp_adapters.client import MultiServerMCPClient
from langchain_openai import ChatOpenAI
from text_to_sql_agent import TextToSQLAgent

# Connect to MCP server
client = MultiServerMCPClient({
    "mysql-server": {
        "url": "http://localhost:8000/mcp",
        "transport": "streamable_http"
    }
})

# Initialize LLM
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Create agent with optional configuration
agent = TextToSQLAgent(
    mcp_client=client,
    llm=llm,
    max_query_attempts=3,  # Maximum retry attempts
    llm_timeout=60,  # Timeout for LLM calls (seconds)
    query_timeout=30,  # Timeout for database queries (seconds)
    max_schema_cache_size=1000,  # Maximum table descriptions to cache
    max_column_cache_size=500,  # Maximum column name extractions to cache
    enable_logging=True  # Enable logging for debugging
)
llm = ChatOpenAI(
    api_key="your-openai-api-key",
    model="gpt-4o-mini",
    temperature=0
)

# Create the agent
agent = TextToSQLAgent(
    mcp_client=client,
    llm=llm,
    max_query_attempts=3  # Maximum retry attempts
)

Usage Examples

Basic Query

# Ask a natural language question
result = await agent.query("How many authors are in the database?")

# Get the final answer
answer = agent.get_final_answer(result)
print(answer)

Query with Filtering

# Complex queries with filters
result = await agent.query("Show me all authors born after 1950")
print(agent.get_final_answer(result))

Aggregation Queries

# Statistical queries
result = await agent.query("What is the average birth year of all authors?")
print(agent.get_final_answer(result))

How It Works

The agent follows an optimized workflow:

Schema Exploration (with caching and parallelization):
- Lists all available tables (cached after first call)
- Intelligently identifies relevant tables using LLM (skips for ≤3 tables)
- Describes table structures for relevant tables only (parallel execution)
- Fetches foreign key relationships for better JOINs (parallel execution)
- Caches all schema information for subsequent queries
Query Validation (before SQL generation):
- Validates that the user query is a valid database question
- Uses fast heuristics (keyword matching) first
- Falls back to LLM validation only for ambiguous cases
- Rejects invalid queries early to avoid unnecessary processing
SQL Generation (with confidence scoring):
- Uses LLM with chain-of-thought reasoning for step-by-step generation
- Includes schema, foreign keys, and column names in prompt
- Executes test query (LIMIT 3) to get sample results
- Calculates confidence score and analysis (single LLM call)
- Confidence scoring explicitly checks if SQL answers the question
- Validates SQL syntax and detects critical issues
- Sets refinement flags for edge routing (does not refine directly)
- Stores final SQL in state to avoid re-extraction
SQL Refinement (if needed):
- Separate node handles refinement when confidence is low or errors detected
- Fetches missing schema if needed
- Refines SQL using analysis and error context
- Re-executes test query and recalculates confidence
Query Execution (optimized to avoid redundancy):
- Uses stored SQL from state (avoids re-extraction)
- Reuses test query results if they contain all data (avoids redundant execution)
- Respects original LIMIT clause when reusing test results (e.g., LIMIT 1 returns 1 row, not all test results)
- Executes full query only when needed
Error Recovery (with intelligent parsing):
- Intelligently parses SQL errors (extracts error type, column, table)
- Passes error context to SQL generation (optional refinement)
- Preserves query structure when fixing simple errors
- Automatically retries failed queries (up to max_query_attempts)

Agent State Graph

The agent uses a LangGraph state machine with the following nodes:

explore_schema: Discovers and caches database schema (with conditional routing for completion)
generate_sql: Converts natural language to SQL using LLM, calculates confidence, sets refinement flags
refine_sql: Refines SQL based on confidence score and analysis (separate node for clarity)
execute_query: Runs SQL queries via MCP tools (reuses test results when possible)
refine_query: Improves queries based on error feedback (routes back to generate_sql)
tools: Handles tool calls for schema exploration

Architecture: The agent follows LangGraph best practices with clear separation:

Nodes: Process state and return updates (no conditional logic)
Edges: Make routing decisions based on state flags (all orchestration logic)

For a detailed explanation of the architecture, workflow graph, and how to extend the agent, see AGENT_ARCHITECTURE.md.

Advanced Usage

Custom Configuration

# Run with custom LangGraph config
result = await agent.query(
    "Find all authors with more than 5 books",
    config={"recursion_limit": 50}
)

Access Full State

# Get complete agent state including all messages
result = await agent.query("Show me the database schema")

# Access messages, schema info, and query attempts
messages = result["messages"]
schema_info = result["schema_info"]
attempts = result["query_attempts"]

Helper Function for Clean Results

from langchain_core.messages import ToolMessage

def get_answer(result):
    """Extract the final answer from agent result"""
    messages = result.get("messages", [])
    for msg in reversed(messages):
        if isinstance(msg, ToolMessage) and "successfully" in msg.content.lower():
            return msg.content
    return result.get("messages", [])[-1].content if result.get("messages") else "No answer"

# Use it
result = await agent.query("How many tables are in the database?")
print(get_answer(result))

Tips for Best Results

Be Specific: Clear, specific questions work best
- ✅ "Show me all authors born after 1950"
- ❌ "authors stuff"
Use Table Names: If you know table names, mention them
- ✅ "List all books in the authors table"
- ✅ "How many records are in the authors table?"
Specify Filters: Be explicit about filtering criteria
- ✅ "Find authors where birth_year is greater than 1950"
- ✅ "Show me authors with names starting with 'G'"
Ask for Aggregations: The agent handles COUNT, SUM, AVG, etc.
- ✅ "What is the average birth year?"
- ✅ "Count the total number of authors"

Limitations

Currently optimized for SELECT queries (read-only operations)
Maximum retry attempts are configurable (default: 3)
Requires OpenAI API key for LLM functionality
Schema information is cached across queries within the same agent instance for better performance

Improving the Agent

For a comprehensive guide on enhancing the agent's performance, accuracy, and features, see IMPROVEMENTS.md.

✅ Implemented Features:

Intelligent Table Selection: Uses LLM to identify only relevant tables (skips for ≤3 tables)
Foreign Key Relationships: Fetches FK info for better JOIN understanding
Schema Caching: Caches schema information across queries (session-based)
Chain-of-Thought Reasoning: Step-by-step query generation for better accuracy
Confidence Scoring: Calculates confidence based on actual query execution results
Auto-Refinement: Automatically improves queries when issues detected
Intelligent Error Parsing: Extracts actionable information from error messages
Performance Optimizations: Compiled regex, parallel execution, result reuse, state-based storage
Code Organization:
- Centralized prompts in prompts.py
- Helper functions reduce duplication (~7.7% code reduction from 2,362 to 2,180 lines)
- Clear code structure with organized sections: Graph Building, Helper Methods, Node Methods, Edge Methods, and Public Methods

Planned Improvements:

Few-Shot Examples: Add example queries to guide better SQL generation patterns
Query Explanation: Explain what generated SQL queries do
Query History and Learning: Learn from past successful queries

See IMPROVEMENTS.md for detailed implementation examples and step-by-step instructions.

MySQL Features

Database-level organization
SHOW DATABASES and SHOW TABLES for performance
DESCRIBE for table structure
SHOW CREATE TABLE for detailed table information

Tools

The server provides the following MCP tools:

server_info: Get server and database information (database type: MySQL, readonly status, MySQL connector version)
db_identity: Get current database identity details (database type: MySQL, database name, user, host, port, server version)
run_query: Execute SQL queries with typed input (returns markdown or JSON string)
run_query_json: Execute SQL queries with typed input (returns JSON list)
list_table_resources: List tables as MCP resource URIs (table://schema/table) - returns structured list
read_table_resource: Read table data (rows) via MCP resource protocol - returns JSON list
list_tables: List tables in a database - returns markdown string (human-readable)
describe_table: Get table structure (columns, types, constraints) - returns markdown string
get_foreign_keys: Get foreign key relationships (via SHOW CREATE TABLE)

Calling MCP Tools

MultiServerMCPClient doesn't have a call_tool() method. Instead, you need to:

Get tools using get_tools() which returns LangChain StructuredTool objects
Find the tool by name
Invoke it using tool.ainvoke() with the appropriate arguments

Efficient Tool Lookup

For better performance (especially with many tools), create a dictionary for O(1) lookups:

# Get all tools
tools = await client.get_tools()

# Create dictionary for fast O(1) lookup (recommended)
tool_dict = {t.name: t for t in tools}

# Now you can call tools efficiently
server_info = await tool_dict["server_info"].ainvoke({})
tables = await tool_dict["list_tables"].ainvoke({"db_schema": None})

Get Server Information

# Get tools
tools = await client.get_tools()
tool_dict = {t.name: t for t in tools}

# Get server info and connection details
result = await tool_dict["server_info"].ainvoke({})
# Returns: {
#   "name": "MySQL Database Explorer",
#   "database_type": "MySQL",  # Database type is explicitly included
#   "readonly": False,
#   "mysql_connector_version": "8.0.33"
# }

db_info = await tool_dict["db_identity"].ainvoke({})
# Returns: {
#   "database_type": "MySQL",  # Database type is explicitly included
#   "database": "mydatabase",
#   "user": "root@localhost",
#   "host": "localhost",
#   "port": 3306,
#   "server_version": "8.0.33"
# }

List Tables

tools = await client.get_tools()
tool_dict = {t.name: t for t in tools}

# Lists tables in the current database
result = await tool_dict["list_tables"].ainvoke({"db_schema": "mydatabase"})

# Or get tables as MCP resources
resources = await tool_dict["list_table_resources"].ainvoke({"schema": "mydatabase"})

Explore Table Structure

tools = await client.get_tools()
tool_dict = {t.name: t for t in tools}

# Get table structure
result = await tool_dict["describe_table"].ainvoke({
    "table_name": "users",
    "db_schema": "mydatabase"
})

# Get foreign key relationships
fks = await tool_dict["get_foreign_keys"].ainvoke({
    "table_name": "users",
    "db_schema": "mydatabase"
})

Execute Query

tools = await client.get_tools()
tool_dict = {t.name: t for t in tools}

# Execute query with markdown output
result = await tool_dict["run_query"].ainvoke({
    "input": {
        "sql": "SELECT * FROM users LIMIT 10",
        "format": "markdown"
    }
})

# Execute query with JSON output
result = await tool_dict["run_query_json"].ainvoke({
    "input": {
        "sql": "SELECT * FROM users LIMIT 10",
        "row_limit": 100
    }
})

MCP Resource Tools

These tools implement the MCP (Model Context Protocol) resource pattern, which allows tables to be treated as discoverable resources that can be accessed via standardized URIs.

`list_table_resources`

Lists all tables in a schema and returns them as MCP resource URIs. This enables MCP clients to discover available tables dynamically.

What it does:

Queries the database to get all table names from the specified schema
Formats each table as a resource URI: table://schema/table_name
Returns a list of these URIs

Example:

# Get all tables as resource URIs
tools = await client.get_tools()
tool_dict = {t.name: t for t in tools}
resources = await tool_dict["list_table_resources"].ainvoke({"schema": "mydatabase"})
# Returns: ["table://mydatabase/users", "table://mydatabase/orders", "table://mydatabase/products"]

Use cases:

Dynamic table discovery in MCP clients
Building UI that lists available tables
Integration with MCP resource-aware tools

`read_table_resource`

Reads data from a specific table using the MCP resource protocol. This provides a standardized way to access table data.

What it does:

Executes SELECT * FROM schema.table with a row limit
Returns table rows as a list of dictionaries (JSON format)
Each dictionary represents one row with column names as keys

Example:

# Read table data via MCP resource protocol
tools = await client.get_tools()
tool_dict = {t.name: t for t in tools}
data = await tool_dict["read_table_resource"].ainvoke({
    "schema": "mydatabase",
    "table": "users",
    "row_limit": 50  # Limits number of rows returned
})
# Returns: [
#   {"id": 1, "name": "Alice", "email": "alice@example.com"},
#   {"id": 2, "name": "Bob", "email": "bob@example.com"},
#   ...
# ]

Use cases:

Quick table previews without writing SQL
MCP resource-aware clients that can fetch table data by URI
Data exploration and inspection tools

Key differences from run_query:

read_table_resource: Simple, standardized way to read entire tables (no SQL needed)
run_query: Flexible, allows any SQL query with custom WHERE clauses, JOINs, etc.

Comparison: Resource Tools vs Regular Tools

`list_table_resources` vs `list_tables`

Both list tables, but serve different purposes:

Feature	`list_table_resources`	`list_tables`
Return Type	`List[str]` (structured data)	`str` (markdown text)
Format	Resource URIs: `["table://schema/users", ...]`	Human-readable markdown table
Use Case	Programmatic access, MCP resource protocol	Human viewing, documentation
Integration	Works with MCP resource-aware clients	General purpose, readable output

Example comparison:

# Get tools once
tools = await client.get_tools()
tool_dict = {t.name: t for t in tools}

# list_table_resources - structured for programs
resources = await tool_dict["list_table_resources"].ainvoke({"schema": "mydb"})
# Returns: ["table://mydb/users", "table://mydb/orders"]

# list_tables - formatted for humans
tables = await tool_dict["list_tables"].ainvoke({"db_schema": "mydb"})
# Returns: "| Tables_in_mydb |\n|-----------------|\n| users          |\n| orders         |"

`read_table_resource` vs `describe_table`

These serve completely different purposes - they're complementary, not redundant:

Feature	`read_table_resource`	`describe_table`
What it returns	Table data (rows)	Table structure (schema)
Return Type	`List[Dict[str, Any]]` (JSON)	`str` (markdown)
SQL Command	`SELECT * FROM table`	`DESCRIBE table`
Use Case	View actual data/rows	View column definitions, types, constraints
Example Output	`[{"id": 1, "name": "Alice"}, ...]`	Column names, types, nullability, keys

Example comparison:

# Get tools once
tools = await client.get_tools()
tool_dict = {t.name: t for t in tools}

# read_table_resource - get the DATA
data = await tool_dict["read_table_resource"].ainvoke({
    "schema": "mydb", "table": "users", "row_limit": 10
})
# Returns: [{"id": 1, "name": "Alice", "email": "alice@example.com"}, ...]

# describe_table - get the STRUCTURE
structure = await tool_dict["describe_table"].ainvoke({
    "table_name": "users", "db_schema": "mydb"
})
# Returns: "| Field | Type | Null | Key | Default | Extra |\n|-------|------|------|-----|---------|-------|\n| id | int | NO | PRI | NULL | auto_increment |\n| name | varchar(100) | NO | | NULL | |"

Summary:

Use read_table_resource when you want to see the data in a table
Use describe_table when you want to see the schema/structure of a table
They answer different questions: "What's in the table?" vs "How is the table defined?"

Notes

MySQL connection strings must start with mysql://
The format is: mysql://user:password@host:port/database
All database names are treated as schemas for compatibility

Documentation

AGENT_ARCHITECTURE.md: Detailed architecture explanation, workflow graph, and extension guide
- How the agent works internally
- Visual workflow diagrams
- Step-by-step guides for adding nodes and tools
- Code examples and patterns
IMPROVEMENTS.md: Comprehensive improvement guide
- Enhancement ideas with priority rankings
- Implementation examples
- Performance optimizations
- Advanced features

Improving the Agent

For a comprehensive guide on enhancing the agent's performance, accuracy, and features, see IMPROVEMENTS.md.

The improvements guide includes:

Priority rankings (High/Medium/Low) for each improvement
Detailed code examples with implementation snippets
Step-by-step integration instructions
Workflow graphs showing how the agent evolves with improvements
Testing strategies and validation approaches
Benefits analysis for each enhancement

Quick Start: Begin with high-priority items like foreign key relationships, better error parsing, and column name suggestions for immediate impact.

MySQL MCP Server

MySQL MCP Server

Features

Installation

Usage

Connection Strings

MySQL

Command Line Options

Environment Variables

Read-Only Mode

LangGraph Text-to-SQL Agent

Features

Setup

Usage Examples

Basic Query

Query with Filtering

Aggregation Queries

How It Works

Agent State Graph

Advanced Usage

Custom Configuration

Access Full State

Helper Function for Clean Results

Tips for Best Results

Limitations

Improving the Agent

MySQL Features

Tools

Calling MCP Tools

Efficient Tool Lookup

Get Server Information

List Tables

Explore Table Structure

Execute Query

MCP Resource Tools

list_table_resources

read_table_resource

Comparison: Resource Tools vs Regular Tools

list_table_resources vs list_tables

read_table_resource vs describe_table

Notes

Documentation

Improving the Agent

Troubleshooting

Connection Issues

MySQL-Specific Issues

MySQL MCP Server

Features

Installation

Usage

Connection Strings

MySQL

Command Line Options

Environment Variables

Read-Only Mode

LangGraph Text-to-SQL Agent

Features

Setup

Usage Examples

Basic Query

Query with Filtering

Aggregation Queries

How It Works

Agent State Graph

Advanced Usage

Custom Configuration

Access Full State

Helper Function for Clean Results

Tips for Best Results

Limitations

Improving the Agent

MySQL Features

Tools

Calling MCP Tools

Efficient Tool Lookup

Get Server Information

List Tables

Explore Table Structure

Execute Query

MCP Resource Tools

`list_table_resources`

`read_table_resource`

`list_table_resources` vs `list_tables`

`read_table_resource` vs `describe_table`

`list_table_resources`

`read_table_resource`

`list_table_resources` vs `list_tables`

`read_table_resource` vs `describe_table`