mineru-tianshu

Name: mineru-tianshu
Author: magicyuan876

Verified Safe

by magicyuan876

View Source

Overview

Enterprise-grade AI data preprocessing platform for converting diverse unstructured multi-modal data (documents, images, audio, video, bioinformatics formats) into structured Markdown and JSON formats, leveraging GPU acceleration and a robust task management system with user authentication and MCP protocol integration.

Installation

Run Command

make setup

Environment Variables

JWT_SECRET_KEY
RUSTFS_ACCESS_KEY
RUSTFS_SECRET_KEY
RUSTFS_PUBLIC_URL
REDIS_QUEUE_ENABLED
REDIS_HOST
REDIS_PORT
REDIS_DB
REDIS_PASSWORD
MODEL_DOWNLOAD_SOURCE
HF_ENDPOINT
HF_TOKEN

Security Notes

The project implements robust JWT and API Key authentication with role-based access control. The `JWT_SECRET_KEY` is correctly parameterized for production, avoiding hardcoded secrets. However, `RUSTFS_ACCESS_KEY` and `RUSTFS_SECRET_KEY` have insecure default values (`rustfsadmin`) in `docker-compose.yml`, which must be explicitly overridden in the `.env` file for production deployments. File uploads via the MCP server (Base64 or URL) are first saved to temporary files before internal API submission, a generally safe practice. Command execution through `subprocess.run` (e.g., `ffmpeg`) appears to use fixed commands with internal file paths, mitigating command injection risks. Ensure `ALLOWED_ORIGINS` is restricted in production.

Similar Servers

kreuzberg

5412

Extracts text, tables, images, and metadata from 56 file formats including PDF, Office documents, and images. Supports multiple OCR backends, extensible plugins, and is designed for data preprocessing in AI/ML workflows.

Other

$Medium

mesh

315

An open-source control plane for Model Context Protocol (MCP) traffic, providing unified authentication, routing, observability, and tool management for AI agents and integrations across various services.

Other

$Low