Back to Home
PyJudge icon

pdf4vllm-mcp

Verified Safe

by PyJudge

Overview

PDF content extraction and search, optimized for messy documents and vision language models (LLMs), with features for text corruption detection, reading order preservation, and token management.

Installation

Run Command
uvx pdf4vllm-mcp

Environment Variables

  • PDF_MAX_PAGES_PER_REQUEST
  • PDF_MAX_IMAGES_PER_REQUEST
  • PDF_MAX_RECURSION_DEPTH
  • PDF_MAX_SUGGESTED_RANGES
  • PDF_MAX_IMAGE_DIMENSION
  • PDF_MIN_IMAGE_DIMENSION
  • PDF_MAX_ASPECT_RATIO
  • PDF_PAGE_IMAGE_DPI
  • PDF_JPEG_QUALITY
  • PDF_HEADER_FOOTER_RATIO
  • PDF_FOOTER_START_RATIO
  • PDF_CORRUPTION_THRESHOLD
  • PDF_DEFAULT_EXTRACTION_MODE

Security Notes

The server uses `subprocess.run` to execute the external `pdfgrep` utility for the `grep_pdf` tool. While arguments are passed as a list to prevent shell injection, the `pattern` argument is interpreted as a regex by `pdfgrep` by default, which could theoretically be exploited for a ReDoS (Regular Expression Denial of Service) attack. This risk is partially mitigated by a 60-second timeout and a `max_count` limit for matches. Path traversal risks for file access are explicitly addressed and mitigated by the `validate_path_security` function, ensuring all file operations are constrained to the current working directory.

Similar Servers

Stats

Interest Score30
Security Score8
Cost ClassHigh
Avg Tokens25000
Stars1
Forks0
Last Update2026-01-12

Tags

PDF ProcessingLLM ToolsDocument AnalysisContent ExtractionVision Models