mcp-pdf-reader

Name: mcp-pdf-reader
Author: Migueel0

by Migueel0

View Source

Overview

Extracts text from PDF files, optionally performing OCR on embedded images, and returns the content.

Installation

Run Command

python main.py

Environment Variables

TESSERACT_CMD

Security Notes

The 'read_pdf' tool accepts a 'file_path' string directly, which is then used to open the PDF. This design is highly susceptible to path traversal vulnerabilities if the MCP server is exposed to untrusted inputs, allowing an attacker to read arbitrary files from the server's file system. Additionally, processing large or maliciously crafted PDF files can lead to resource exhaustion (CPU, memory) due to `pypdf` parsing and `pytesseract` OCR operations. Broad exception handling in image processing (`except Exception: continue`) can also mask underlying issues. The reliance on an external Tesseract executable introduces a dependency on its security posture.

Similar Servers

kreuzberg

5412

Extracts text, tables, images, and metadata from 56 file formats including PDF, Office documents, and images. Supports multiple OCR backends, extensible plugins, and is designed for data preprocessing in AI/ML workflows.

Other

$Medium