MCP
Verified Safeby stevenpto
Overview
Extracts text from PDF documents, including support for OCR on scanned pages, and summarizes the extracted content using context-aware guidance.
Installation
python -m src.serverSecurity Notes
The server processes local PDF files. Potential risks primarily arise from vulnerabilities within the PyMuPDF or Tesseract libraries when handling malformed or malicious PDF inputs, or if the `file_path` parameter is not properly controlled by the calling agent, potentially exposing unintended local files. The server code itself does not contain 'eval', obfuscation, or hardcoded secrets. NLTK data downloads are handled quietly.
Similar Servers
kreuzberg
High-performance document intelligence for extracting text, metadata, and structured information from a wide range of document formats including PDFs, Office documents, images, and HTML. It supports advanced features like OCR, table extraction, chunking, language detection, and embedding generation, powered by a Rust core for native performance.
kreuzberg
High-performance document intelligence platform for extracting text, metadata, and structured information (tables, images, chunks) from over 50 diverse document formats (PDFs, Office, images, HTML, etc.). It offers advanced OCR capabilities, multilingual support, and features like chunking, embeddings, and keyword extraction. Functionality is exposed via multiple language bindings and a Micro-service Communication Protocol (MCP) server for flexible integration.
pdf-reader-mcp
Provides production-ready PDF processing capabilities for AI agents, including extraction of text, images, and metadata from local files or URLs.
pageindex-mcp
Provides vectorless, reasoning-based RAG capabilities for LLMs to navigate and retrieve information from hierarchical document structures, primarily for long PDFs.