langextract
Verified Safeby haritha8503
Overview
This server provides a RESTful API to extract the content of single files or an entire repository from GitHub, intended for programmatic code analysis or processing.
Installation
uvicorn main:app --host 0.0.0.0 --port 8000Environment Variables
- GITHUB_TOKEN
Security Notes
The server fetches content from the public GitHub API and does not execute any user-provided code locally. It properly uses environment variables for the GITHUB_TOKEN, and all external requests are to the GitHub API. Path parameters are incorporated into GitHub API URLs, relying on GitHub's own security for path validation. Recursion depth for repository listing is limited (max_depth=5), and an in-memory cache has a defined size limit. No 'eval' or other directly malicious patterns were found. The primary security consideration is related to GitHub API rate limits if the GITHUB_TOKEN is not provided, which could lead to service unavailability due to throttling.
Similar Servers
kreuzberg
Extract text, tables, images, and metadata from various file formats including PDF, Office documents, and images, with advanced features like OCR, chunking, and embeddings.
kreuzberg
An MCP (Message Control Protocol) server for high-performance extraction of text, tables, images, and metadata from various document formats (PDF, Office, HTML, etc.), supporting OCR and advanced text processing.
DevDocs
DevDocs is a web crawling and content extraction platform designed to accelerate software development by converting documentation into LLM-ready formats for intelligent data querying and fine-tuning.
pdf-reader-mcp
Provides a robust server for AI agents to extract text, images, and metadata from PDF documents, preserving content order for better comprehension.