PDF Redaction MCP Server
A Model Context Protocol (MCP) server for PDF redaction using PyMuPDF (fitz). This server provides tools for loading PDFs, identifying and redacting sensitive text, and saving redacted documents.
Features
- 📄 Load and read PDF files - Extract text content from PDFs for review
- 🔍 Batch text redaction - Search and redact multiple text strings at once for maximum efficiency
- 📋 Redaction tracking - Keep track of what's been redacted to prevent duplicate work
- 🔎 List applied redactions - Audit trail showing which texts have been marked for redaction
- 📐 Area-based redaction - Redact specific rectangular regions by coordinates
- 💾 Save redacted PDFs - Apply redactions and save with automatic naming
- 🎨 Customizable redaction appearance - Choose redaction fill colors
- 🔒 Error handling - Comprehensive error messages via MCP protocol
Installation
This project uses uv for package management. To install:
# Clone the repository
git clone <your-repo-url>
cd redact_mcp
# Install with uv
uv pip install -e .
Usage
Running the Server
You can run the server using either the Python script directly or the FastMCP CLI:
Option 1: Direct Python execution (stdio transport)
python -m redact_mcp.server
Option 2: Using FastMCP CLI
# Stdio transport (default)
fastmcp run redact_mcp.server:mcp
# HTTP transport for remote access
fastmcp run redact_mcp.server:mcp --transport http --port 8000
Installing in MCP Clients
Claude Desktop
Add to your Claude Desktop configuration file:
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
{
"mcpServers": {
"pdf-redaction": {
"command": "uv",
"args": [
"--directory",
"/path/to/redact_mcp",
"run",
"fastmcp",
"run",
"redact_mcp.server:mcp"
]
}
}
}
Other MCP Clients
Use the FastMCP CLI to generate configuration for other clients:
# For Cursor
fastmcp install cursor redact_mcp.server:mcp
# For Gemini CLI
fastmcp install gemini-cli redact_mcp.server:mcp
# Generate generic MCP JSON configuration
fastmcp install mcp-json redact_mcp.server:mcp
Available Tools
1. load_pdf
Load a PDF file and extract its text content.
Parameters:
pdf_path(string): Path to the PDF file to load
Returns: The full text content of the PDF, organized by pages
Example:
Load the PDF at /path/to/document.pdf
2. redact_text
Redact all instances of specific texts in a loaded PDF. This tool now accepts multiple texts at once for efficient batch redaction. It automatically tracks which texts have already been redacted to prevent duplicate work.
Parameters:
pdf_path(string): Path to the loaded PDF filetexts_to_redact(list of strings): List of text strings to search for and redactfill_color(tuple, optional): RGB color (0-1 range) for redaction box. Default: (0, 0, 0) - black
Returns: Summary of redaction operations, including which texts were newly redacted and which were skipped (already redacted)
Examples:
# Single text
Redact ["confidential"] in /path/to/document.pdf
# Multiple texts at once (recommended for efficiency)
Redact ["John Doe", "123-45-6789", "john.doe@email.com"] in /path/to/document.pdf
Note: The tool tracks which texts have been redacted and will skip any texts that were already processed, preventing duplicate redactions.
3. redact_area
Redact a specific rectangular area on a PDF page.
Parameters:
pdf_path(string): Path to the loaded PDF filepage_number(int): Page number (1-indexed)x0(float): Left x coordinatey0(float): Top y coordinatex1(float): Right x coordinatey1(float): Bottom y coordinatefill_color(tuple, optional): RGB color (0-1 range) for redaction box. Default: (0, 0, 0) - black
Returns: Confirmation message
Example:
Redact the area from (100, 100) to (300, 150) on page 1 of /path/to/document.pdf
4. save_redacted_pdf
Apply all pending redactions and save the PDF.
Parameters:
pdf_path(string): Path to the loaded PDF fileoutput_path(string, optional): Custom output path. If not provided, appends "_redacted" to original filename
Returns: Path to the saved redacted PDF
Example:
Save the redacted version of /path/to/document.pdf
5. list_loaded_pdfs
List all currently loaded PDF files.
Parameters: None
Returns: List of loaded PDF paths with page counts
6. list_applied_redactions
List all redactions that have been applied to loaded PDF(s). New tool for tracking redaction progress and avoiding duplicate work.
Parameters:
pdf_path(string, optional): Path to a specific PDF. If not provided, lists redactions for all loaded PDFs
Returns: List of texts that have been marked for redaction in each PDF
Examples:
# List redactions for a specific PDF
List applied redactions for /path/to/document.pdf
# List redactions for all loaded PDFs
List all applied redactions
Use Cases:
- Check what has already been redacted before adding more redactions
- Verify redaction progress during a multi-step process
- Avoid duplicate redaction attempts
- Generate a report of what was redacted
7. close_pdf
Close a loaded PDF and free its resources. This also clears the redaction tracking for that PDF.
Parameters:
pdf_path(string): Path to the PDF file to close
Returns: Confirmation message
Workflow Example
Here's a typical workflow using this MCP server:
-
Load a PDF
Load the PDF at /Users/me/documents/sensitive.pdf -
Review the content The tool will return the full text content, which you can review to identify sensitive information.
-
Redact sensitive text (batch mode - recommended)
Redact ["Social Security Number", "123-45-6789", "John Doe", "jane.smith@email.com"] in /Users/me/documents/sensitive.pdfPro tip: Redacting multiple texts at once is much faster than calling the tool multiple times.
-
Check what has been redacted (optional)
List applied redactions for /Users/me/documents/sensitive.pdfThis shows you which texts have already been marked for redaction.
-
Add more redactions if needed
Redact ["Additional Text", "Another Secret"] in /Users/me/documents/sensitive.pdfThe tool will skip any texts that were already redacted in step 3.
-
Redact specific areas (optional)
Redact the area from (50, 100) to (200, 120) on page 2 of /Users/me/documents/sensitive.pdf -
Save the redacted PDF
Save the redacted version of /Users/me/documents/sensitive.pdfThis will create
/Users/me/documents/sensitive_redacted.pdf -
Close the PDF (optional)
Close /Users/me/documents/sensitive.pdf
Technical Details
Performance Tips
Batch Redaction is Faster:
# ❌ Slower: Multiple individual calls
Redact ["John Doe"] in document.pdf
Redact ["123-45-6789"] in document.pdf
Redact ["jane@email.com"] in document.pdf
# ✅ Faster: Single batch call
Redact ["John Doe", "123-45-6789", "jane@email.com"] in document.pdf
Why batch redaction is better:
- Reduces tool invocation overhead
- Scans the PDF only once
- Applies all redactions in a single pass
- Automatically prevents duplicate redactions
- Provides a single summary of all operations
Best Practice: Collect all texts to redact first, then make one batch call.
Dependencies
- FastMCP (>=2.12.0): Python framework for building MCP servers
- PyMuPDF (>=1.24.0): PDF manipulation library (imported as
fitz)
Architecture
- In-memory storage: Loaded PDFs are kept in memory for fast access during redaction operations
- Redaction tracking: The server tracks which texts have been redacted to prevent duplicate work
- Batch processing: Multiple texts can be redacted in a single tool call for improved performance
- Lazy application: Redaction annotations are added but not applied until
save_redacted_pdfis called - Error handling: Uses FastMCP's
ToolErrorfor proper error propagation to MCP clients - Context logging: All operations log to the MCP context for transparency
Limitations (Current Version)
- Text-only redaction: This version focuses on text redaction. Image redaction is not yet implemented.
- Memory usage: PDFs are kept in memory while loaded. Very large PDFs may consume significant memory.
- Single session: The in-memory store is not persistent across server restarts.
Development
Running Tests
# Install development dependencies
uv pip install -e ".[dev]"
# Run tests (when implemented)
pytest
Code Structure
redact_mcp/
├── src/
│ └── redact_mcp/
│ ├── __init__.py # Package initialization
│ └── server.py # Main MCP server implementation
├── pyproject.toml # Package configuration
└── README.md # This file
License
Apache-2.0
Contributing
Contributions are welcome! Please feel free to submit issues or pull requests.
