Back to Home
Migueel0 icon

mcp-pdf-reader

by Migueel0

Overview

Extracts text from PDF files, optionally performing OCR on embedded images, and returns the content.

Installation

Run Command
python main.py

Environment Variables

  • TESSERACT_CMD

Security Notes

The 'read_pdf' tool accepts a 'file_path' string directly, which is then used to open the PDF. This design is highly susceptible to path traversal vulnerabilities if the MCP server is exposed to untrusted inputs, allowing an attacker to read arbitrary files from the server's file system. Additionally, processing large or maliciously crafted PDF files can lead to resource exhaustion (CPU, memory) due to `pypdf` parsing and `pytesseract` OCR operations. Broad exception handling in image processing (`except Exception: continue`) can also mask underlying issues. The reliance on an external Tesseract executable introduces a dependency on its security posture.

Similar Servers

Stats

Interest Score0
Security Score5
Cost ClassMedium
Avg Tokens7500
Stars0
Forks0
Last Update2025-12-06

Tags

PDF processingOCRText extractionPythonMCP