mcp-pdf-reader
by Migueel0
Overview
Extracts text from PDF files, optionally performing OCR on embedded images, and returns the content.
Installation
python main.pyEnvironment Variables
- TESSERACT_CMD
Security Notes
The 'read_pdf' tool accepts a 'file_path' string directly, which is then used to open the PDF. This design is highly susceptible to path traversal vulnerabilities if the MCP server is exposed to untrusted inputs, allowing an attacker to read arbitrary files from the server's file system. Additionally, processing large or maliciously crafted PDF files can lead to resource exhaustion (CPU, memory) due to `pypdf` parsing and `pytesseract` OCR operations. Broad exception handling in image processing (`except Exception: continue`) can also mask underlying issues. The reliance on an external Tesseract executable introduces a dependency on its security posture.
Similar Servers
kreuzberg
Extracts text, tables, images, and metadata from 56 file formats including PDF, Office documents, and images. Supports multiple OCR backends, extensible plugins, and is designed for data preprocessing in AI/ML workflows.
pdf-reader-mcp
Provides a robust server for AI agents to extract text, images, and metadata from PDF documents, preserving content order for better comprehension.
mcp-pdf-reader
Exposes local PDFs for reading, semantic search, chunking, and table extraction to MCP-compatible agents or via a CLI.
pdflens-mcp
This MCP server provides tools for reading and extracting information from PDF files, including text and images, designed for AI clients.