pdf4vllm-mcp
Verified Safeby PyJudge
Overview
PDF content extraction and search, optimized for messy documents and vision language models (LLMs), with features for text corruption detection, reading order preservation, and token management.
Installation
uvx pdf4vllm-mcpEnvironment Variables
- PDF_MAX_PAGES_PER_REQUEST
- PDF_MAX_IMAGES_PER_REQUEST
- PDF_MAX_RECURSION_DEPTH
- PDF_MAX_SUGGESTED_RANGES
- PDF_MAX_IMAGE_DIMENSION
- PDF_MIN_IMAGE_DIMENSION
- PDF_MAX_ASPECT_RATIO
- PDF_PAGE_IMAGE_DPI
- PDF_JPEG_QUALITY
- PDF_HEADER_FOOTER_RATIO
- PDF_FOOTER_START_RATIO
- PDF_CORRUPTION_THRESHOLD
- PDF_DEFAULT_EXTRACTION_MODE
Security Notes
The server uses `subprocess.run` to execute the external `pdfgrep` utility for the `grep_pdf` tool. While arguments are passed as a list to prevent shell injection, the `pattern` argument is interpreted as a regex by `pdfgrep` by default, which could theoretically be exploited for a ReDoS (Regular Expression Denial of Service) attack. This risk is partially mitigated by a 60-second timeout and a `max_count` limit for matches. Path traversal risks for file access are explicitly addressed and mitigated by the `validate_path_security` function, ensuring all file operations are constrained to the current working directory.
Similar Servers
kreuzberg
Extracts text, tables, images, and metadata from a wide range of document formats (PDF, Office, images, HTML, etc.), with support for multiple OCR backends and an extensible plugin system. Can be run as a Micro-Agent Communication Protocol (MCP) server.
kreuzberg
Extracts text, tables, images, and metadata from 56 file formats including PDF, Office documents, and images. Supports multiple OCR backends, extensible plugins, and is designed for data preprocessing in AI/ML workflows.
pdf-reader-mcp
Provides a robust server for AI agents to extract text, images, and metadata from PDF documents, preserving content order for better comprehension.
pageindex-mcp
This MCP server acts as a bridge, enabling LLM-native, reasoning-based RAG on documents (local or online PDFs) for MCP-compatible agents like Claude and Cursor, without requiring a vector database locally.