readPDF_mcp_server
by rexfelix
Overview
This server reads PDF documents, extracts text, images, and tables, and provides them to an AI agent in Markdown format.
Installation
uv run src/server/main.pySecurity Notes
The `read_pdf_resource` tool allows reading arbitrary files from the server's filesystem via absolute paths (e.g., `pdf:///etc/passwd`). This is a critical information disclosure vulnerability. Processing untrusted PDFs from local files or URLs (via `read_pdf`) can expose the system to vulnerabilities in underlying libraries (PyMuPDF, pdfplumber, pytesseract) and potentially lead to SSRF for URL-based sources. OCR with Tesseract also involves external command execution, which could be a vector if not properly sanitized. The server does not explicitly implement input sanitization or sandboxing for untrusted PDF content or file paths.
Similar Servers
kreuzberg
Extracts text, tables, images, and metadata from 56 file formats including PDF, Office documents, and images. Supports multiple OCR backends, extensible plugins, and is designed for data preprocessing in AI/ML workflows.
pdf-reader-mcp
Provides a robust server for AI agents to extract text, images, and metadata from PDF documents, preserving content order for better comprehension.
pdflens-mcp
This MCP server provides tools for reading and extracting information from PDF files, including text and images, designed for AI clients.
lyra-tool-discovery
This MCP server is designed to fetch, parse, and organize documentation from websites implementing the llms.txt standard. It transforms raw documentation into structured, agent-ready formats, exposing tools for AI agents, LLMs, and automation workflows to consume documentation programmatically.