PDFlow
Verified Safeby Traves-Theberge
Overview
Transform PDF documents into structured data (Markdown, JSON, XML, etc.) using multimodal AI, with web UI, CLI, and AI agent integration.
Installation
export GEMINI_API_KEY="your-api-key-here" && USER_ID=$(id -u) GROUP_ID=$(id -g) docker-compose up -dEnvironment Variables
- GEMINI_API_KEY
- PDFLOW_BASE_URL
Security Notes
The project demonstrates a strong focus on security. Explicit path validation functions (`isValidPathComponent`, `getSecureFilePath`) are implemented to prevent directory traversal in API routes. The `child_process.spawn` method is used over `exec` with arguments passed as an array to prevent command injection when executing shell scripts. API keys are managed client-side in `sessionStorage` or via environment variables, not hardcoded. Docker configuration uses a non-root user (`nextjs:nodejs`), `read_only: true` for the filesystem (with exceptions for necessary write directories), `no-new-privileges:true`, and resource limits, adhering to robust container security practices. The MCP server also implements file path validation against a configurable list of `ALLOWED_DIRECTORIES`.
Similar Servers
golf
A Python framework for building conversational AI servers (MCP servers) by defining tools, resources, and prompts as modular Python files, with integrated authentication, telemetry, and LLM interaction utilities.
mcp-server-infranodus
Integrates InfraNodus knowledge graph and text network analysis capabilities into LLM workflows and AI assistants for generating knowledge graphs, detecting content gaps, identifying topics, and performing SEO analysis.
meds-mcp
A Medical Context Protocol (MCP) server for retrieving and analyzing de-identified patient EHR data, facilitating LLM-powered chat interaction and evidence review with medical ontologies and faceted search.
terry-form-mcp
Enables AI assistants to securely execute Terraform commands and leverage LSP-driven code intelligence for infrastructure-as-code management.