mineru-tianshu
Verified Safeby magicyuan876
Overview
An enterprise-grade AI data preprocessing platform that converts unstructured data (documents, images, audio, video, bioinformatics formats) into AI-ready structured Markdown and JSON formats.
Installation
make setupEnvironment Variables
- API_PORT
- WORKER_PORT
- MCP_PORT
- RUSTFS_PORT
- FRONTEND_PORT
- JWT_SECRET_KEY
- JWT_ALGORITHM
- ACCESS_TOKEN_EXPIRE_MINUTES
- ALLOWED_ORIGINS
- CUDA_VISIBLE_DEVICES
- WORKER_URL
- MAX_FILE_SIZE
- MODEL_PATH
- OUTPUT_PATH
- DATABASE_PATH
- MODEL_DOWNLOAD_SOURCE
- HF_ENDPOINT
- WORKER_GPUS
- MAX_BATCH_SIZE
- WORKER_TIMEOUT
- RUSTFS_ENDPOINT
- RUSTFS_ACCESS_KEY
- RUSTFS_SECRET_KEY
- RUSTFS_BUCKET
- RUSTFS_SECURE
- RUSTFS_PUBLIC_URL
- TZ
- VITE_API_BASE_URL
- NGINX_CLIENT_MAX_BODY_SIZE
- MINERU_VIRTUAL_VRAM_SIZE
Security Notes
The project uses JWT for authentication and API keys for external integration, which is good. Development configurations like `JWT_SECRET_KEY=dev-secret-key-change-in-production` in `docker-compose.dev.yml` and default `RUSTFS_ACCESS_KEY`/`RUSTFS_SECRET_KEY` (`rustfsadmin`) are present but explicitly warned against for production. The `docker-setup.sh` script attempts to generate a secure `JWT_SECRET_KEY` for new `.env` files. The `/v1/files/output/{file_path:path}` endpoint serves processed files, which requires careful path handling to prevent traversal attacks, though `quote` and `unquote` functions are used, indicating an attempt at sanitization.
Similar Servers
kreuzberg
High-performance document intelligence platform for extracting text, metadata, and structured information (tables, images, chunks) from over 50 diverse document formats (PDFs, Office, images, HTML, etc.). It offers advanced OCR capabilities, multilingual support, and features like chunking, embeddings, and keyword extraction. Functionality is exposed via multiple language bindings and a Micro-service Communication Protocol (MCP) server for flexible integration.
mcphub
A hub for managing, orchestrating, and providing a unified API for various Model Context Protocol (MCP) servers and their tools, including user management, OAuth services, and discovery of external servers.
mcp-omnisearch
Provides a unified interface for LLMs to access multiple web search, AI response, content processing, and enhancement tools from various providers through the Model Context Protocol (MCP).
tmcp
Build Model Context Protocol (MCP) servers for AI agents, providing schema-agnostic tools, resources, and prompts, with optional OAuth 2.1 authentication and distributed session management.