pdf4vllm-mcp

Name: pdf4vllm-mcp
Author: PyJudge

Verified Safe

by PyJudge

View Source

Overview

PDF content extraction and search, optimized for messy documents and vision language models (LLMs), with features for text corruption detection, reading order preservation, and token management.

Installation

Run Command

uvx pdf4vllm-mcp

Environment Variables

PDF_MAX_PAGES_PER_REQUEST
PDF_MAX_IMAGES_PER_REQUEST
PDF_MAX_RECURSION_DEPTH
PDF_MAX_SUGGESTED_RANGES
PDF_MAX_IMAGE_DIMENSION
PDF_MIN_IMAGE_DIMENSION
PDF_MAX_ASPECT_RATIO
PDF_PAGE_IMAGE_DPI
PDF_JPEG_QUALITY
PDF_HEADER_FOOTER_RATIO
PDF_FOOTER_START_RATIO
PDF_CORRUPTION_THRESHOLD
PDF_DEFAULT_EXTRACTION_MODE

Security Notes

The server uses `subprocess.run` to execute the external `pdfgrep` utility for the `grep_pdf` tool. While arguments are passed as a list to prevent shell injection, the `pattern` argument is interpreted as a regex by `pdfgrep` by default, which could theoretically be exploited for a ReDoS (Regular Expression Denial of Service) attack. This risk is partially mitigated by a 60-second timeout and a `max_count` limit for matches. Path traversal risks for file access are explicitly addressed and mitigated by the `validate_path_security` function, ensuring all file operations are constrained to the current working directory.

Similar Servers

kreuzberg

5420

Extracts text, tables, images, and metadata from a wide range of document formats (PDF, Office, images, HTML, etc.), with support for multiple OCR backends and an extensible plugin system. Can be run as a Micro-Agent Communication Protocol (MCP) server.

Other

$Medium

kreuzberg

5412

Extracts text, tables, images, and metadata from 56 file formats including PDF, Office documents, and images. Supports multiple OCR backends, extensible plugins, and is designed for data preprocessing in AI/ML workflows.

Other

$Medium

pdf-reader-mcp

433

Provides a robust server for AI agents to extract text, images, and metadata from PDF documents, preserving content order for better comprehension.

Other

$High

pageindex-mcp

112

This MCP server acts as a bridge, enabling LLM-native, reasoning-based RAG on documents (local or online PDFs) for MCP-compatible agents like Claude and Cursor, without requiring a vector database locally.

Other

$Low

Stats

Interest Score30

Security Score8

Cost ClassHigh

Avg Tokens25000

Stars1

Forks0

Last Update2026-01-12

pdf4vllm-mcp

Overview

Installation

Environment Variables

Security Notes

Similar Servers

kreuzberg

kreuzberg

pdf-reader-mcp

pageindex-mcp

Stats

Tags