mcp-semantic-pdf-reader
Verified Safeby tomo8812
Overview
Provides semantic PDF reading capabilities by converting PDF content to structured Markdown and extracting metadata, optimized for LLM consumption.
Installation
docker run -i --rm -v /path/to/your/pdfs:/data mcp-semantic-pdf-readerSecurity Notes
The `read_pdf_structure` and `get_pdf_metadata` tools take a `path` argument, allowing the server to access files on the host filesystem (or within its Docker container). While the recommended Docker setup mitigates this risk by explicitly mounting a PDF-containing directory to `/data`, running the server directly via `pip` or `uv` without proper sandboxing could allow a malicious client to trigger arbitrary file reads by the server process. There are no direct `eval` calls, hardcoded secrets, or obvious malicious network patterns. The `HF_HUB_DISABLE_SYMLINKS` environment variable is explicitly set as a security measure.
Similar Servers
kreuzberg
Extracts text, tables, images, and metadata from a wide range of document formats (PDF, Office, images, HTML, etc.), with support for multiple OCR backends and an extensible plugin system. Can be run as a Micro-Agent Communication Protocol (MCP) server.
kreuzberg
Extracts text, tables, images, and metadata from 56 file formats including PDF, Office documents, and images. Supports multiple OCR backends, extensible plugins, and is designed for data preprocessing in AI/ML workflows.
pdf-reader-mcp
Provides a robust server for AI agents to extract text, images, and metadata from PDF documents, preserving content order for better comprehension.
pageindex-mcp
This MCP server acts as a bridge, enabling LLM-native, reasoning-based RAG on documents (local or online PDFs) for MCP-compatible agents like Claude and Cursor, without requiring a vector database locally.