document-parser-mcp
Verified Safeby kgand
Overview
Intelligent document parsing and conversion to clean Markdown for AI processing and RAG pipelines.
Installation
python -m document_parserEnvironment Variables
- DOCUMENT_PARSER_CONFIG
Security Notes
The server uses `httpx` for downloading remote files and incorporates measures like URL scheme validation (`allowed_schemes`), maximum file size limits (`max_file_size_mb`), and filename sanitization (`sanitize_filename`) to mitigate risks like SSRF or path traversal when handling external content. Communication with MCP clients occurs over standard I/O (stdio_server), which generally limits direct network attack surface. Reliance on external libraries like `docling` (and optional `easyocr`, `mlx`) introduces dependency-related security considerations that would require auditing of those upstream projects. No 'eval' or obvious hardcoded secrets were found.
Similar Servers
kreuzberg
Extracts text, tables, images, and metadata from 56 file formats including PDF, Office documents, and images. Supports multiple OCR backends, extensible plugins, and is designed for data preprocessing in AI/ML workflows.
pageindex-mcp
This MCP server acts as a bridge, enabling LLM-native, reasoning-based RAG on documents (local or online PDFs) for MCP-compatible agents like Claude and Cursor, without requiring a vector database locally.
lyra-tool-discovery
This MCP server is designed to fetch, parse, and organize documentation from websites implementing the llms.txt standard. It transforms raw documentation into structured, agent-ready formats, exposing tools for AI agents, LLMs, and automation workflows to consume documentation programmatically.
md-server
Converts various documents, webpages, and media files into markdown format, serving as an HTTP API or an MCP server for AI assistants to read and process content.