document-reader-mcp
Verified Safeby ggmenu
Overview
Extracts text from various document formats (PDF, DOCX, XLSX, CSV, TXT, JSON, Markdown) and converts them to Markdown.
Installation
python -m server.mainEnvironment Variables
- DOC_READER_RATE_LIMIT_PER_MINUTE
- DOC_READER_MAX_OUTPUT_CHARS
- DOC_READER_DEFAULT_MAX_ROWS
- DOC_READER_DEFAULT_MAX_PAGES
Security Notes
The server explicitly states it is designed for local, trusted environments and has no built-in authentication. It processes local files using `os.path.expanduser`, which could lead to reading arbitrary files within the process's permissions if an untrusted client sends malicious paths. Document parsing libraries (pdfminer.six, openpyxl, python-docx, markitdown, PyMuPDF) are used, which inherently carry risks if malformed or malicious documents are processed (no internal sandboxing for these libraries). However, the project provides comprehensive security documentation (`SECURITY.md`), enforces a 100MB file size limit, implements rate limiting, and truncates output for AI context protection. The `convert_to_markdown` tool converts the *entire* document to a file, bypassing the output truncation for the AI's preview, which could consume significant local resources.
Similar Servers
kreuzberg
Extracts text, tables, images, and metadata from 56 file formats including PDF, Office documents, and images. Supports multiple OCR backends, extensible plugins, and is designed for data preprocessing in AI/ML workflows.
html-to-markdown-mcp
Converts HTML content from web pages or raw strings into Markdown format, with options for including metadata, truncating content, and saving to files.
defuddle-fetch-mcp-server
This server allows LLMs to fetch web content, automatically cleaning HTML into markdown, extracting key metadata like title and author, and supporting chunked reading.
md-server
Converts various documents, webpages, and media files into markdown format, serving as an HTTP API or an MCP server for AI assistants to read and process content.