Document-handler
Verified Safeby sendsta
Overview
The MCP server provides document parsing, OCR, and advanced analytical tools to extract key information (requirements, deadlines, contacts, evaluation criteria) from tender documents for the tri-tender system.
Installation
docker run -i --rm tender-docs-mcp:latestEnvironment Variables
- TENDER_UPLOAD_DIR
- TENDER_PROCESSED_DIR
- TENDER_CACHE_DIR
Security Notes
The server primarily relies on external libraries (pdfplumber, python-docx, pytesseract, pandoc) for core document processing, which introduces dependencies on their security. The `subprocess.run` calls to `pandoc` and `soffice` directly use `str(file_path)` which is generally safe against shell injection, but could still be vulnerable if the external tools themselves have vulnerabilities when processing malicious files. The `import_document` tool in `server.py` supports only base64 encoding, mitigating URL-based SSRF risks present in an older implementation (tender_docs_mcp.py). No direct `eval()` calls or hardcoded secrets were found. File handling uses system temporary directories and explicit Path objects which is good.
Similar Servers
kreuzberg
Extracts text, tables, images, and metadata from a wide range of document formats (PDF, Office, images, HTML, etc.), with support for multiple OCR backends and an extensible plugin system. Can be run as a Micro-Agent Communication Protocol (MCP) server.
kreuzberg
Extracts text, tables, images, and metadata from 56 file formats including PDF, Office documents, and images. Supports multiple OCR backends, extensible plugins, and is designed for data preprocessing in AI/ML workflows.
DevDocs
DevDocs is a web crawling and content extraction platform designed to accelerate software development by converting documentation into LLM-ready formats for intelligent data querying and fine-tuning.
Matryoshka
Processes large documents beyond LLM context windows using a Recursive Language Model (RLM) that executes symbolic commands for iterative document analysis.