readPDF_mcp_server
by rexfelix
Overview
This server reads PDF documents, extracts text, images, and tables, and provides them to an AI agent in Markdown format.
Installation
uv run src/server/main.pySecurity Notes
The `read_pdf_resource` tool allows reading arbitrary files from the server's filesystem via absolute paths (e.g., `pdf:///etc/passwd`). This is a critical information disclosure vulnerability. Processing untrusted PDFs from local files or URLs (via `read_pdf`) can expose the system to vulnerabilities in underlying libraries (PyMuPDF, pdfplumber, pytesseract) and potentially lead to SSRF for URL-based sources. OCR with Tesseract also involves external command execution, which could be a vector if not properly sanitized. The server does not explicitly implement input sanitization or sandboxing for untrusted PDF content or file paths.
Similar Servers
kreuzberg
High-performance document intelligence for extracting text, metadata, and structured information from diverse document formats like PDFs, Office files, images, and structured data, powered by a Rust core with multi-language bindings and advanced OCR capabilities.
pdf-reader-mcp
Extracts text, images, and metadata from PDF files for AI agent consumption, supporting local files and URLs with parallel processing and content ordering.
pdflens-mcp
Provides an MCP server for AI agents to programmatically read and extract information (text, page count, images) from PDF documents within user-defined workspaces.
markitdown-mcp
Converts various file formats to Markdown for AI workflows like Claude Desktop.