mineru-tianshu
Verified Safeby magicyuan876
Overview
Enterprise-grade AI data preprocessing platform for converting diverse unstructured multi-modal data (documents, images, audio, video, bioinformatics formats) into structured Markdown and JSON formats, leveraging GPU acceleration and a robust task management system with user authentication and MCP protocol integration.
Installation
make setupEnvironment Variables
- JWT_SECRET_KEY
- RUSTFS_ACCESS_KEY
- RUSTFS_SECRET_KEY
- RUSTFS_PUBLIC_URL
- REDIS_QUEUE_ENABLED
- REDIS_HOST
- REDIS_PORT
- REDIS_DB
- REDIS_PASSWORD
- MODEL_DOWNLOAD_SOURCE
- HF_ENDPOINT
- HF_TOKEN
Security Notes
The project implements robust JWT and API Key authentication with role-based access control. The `JWT_SECRET_KEY` is correctly parameterized for production, avoiding hardcoded secrets. However, `RUSTFS_ACCESS_KEY` and `RUSTFS_SECRET_KEY` have insecure default values (`rustfsadmin`) in `docker-compose.yml`, which must be explicitly overridden in the `.env` file for production deployments. File uploads via the MCP server (Base64 or URL) are first saved to temporary files before internal API submission, a generally safe practice. Command execution through `subprocess.run` (e.g., `ffmpeg`) appears to use fixed commands with internal file paths, mitigating command injection risks. Ensure `ALLOWED_ORIGINS` is restricted in production.
Similar Servers
kreuzberg
Extracts text, tables, images, and metadata from 56 file formats including PDF, Office documents, and images. Supports multiple OCR backends, extensible plugins, and is designed for data preprocessing in AI/ML workflows.
mesh
An open-source control plane for Model Context Protocol (MCP) traffic, providing unified authentication, routing, observability, and tool management for AI agents and integrations across various services.
mcp-omnisearch
Provides a unified interface for various search, AI response, content processing, and enhancement tools via Model Context Protocol (MCP).
tmcp
A server implementation for the Model Context Protocol (MCP) to enable LLMs to access external context and tools.