kreuzberg

Name: kreuzberg
Author: Goldziher

Verified Safe

by Goldziher

View Source

Overview

Extracts text, tables, images, and metadata from a wide range of document formats (PDF, Office, images, HTML, etc.), with support for multiple OCR backends and an extensible plugin system. Can be run as a Micro-Agent Communication Protocol (MCP) server.

Installation

Run Command

kreuzberg mcp

Environment Variables

KREUZBERG_CONFIG_PATH
KREUZBERG_OCR_BACKEND
KREUZBERG_OCR_LANGUAGE
KREUZBERG_CHUNK_MAX_CHARS
KREUZBERG_CHUNK_MAX_OVERLAP
KREUZBERG_USE_CACHE
KREUZBERG_TOKEN_REDUCTION_MODE
KREUZBERG_ENCODING_CACHE_MAX_ENTRIES
KREUZBERG_ENCODING_CACHE_MAX_BYTES
KREUZBERG_API_HOST
KREUZBERG_API_PORT
KREUZBERG_API_CORS_ORIGINS
KREUZBERG_API_MAX_REQUEST_BODY_BYTES
KREUZBERG_API_MAX_MULTIPART_FIELD_BYTES
KREUZBERG_API_MAX_UPLOAD_MB

Security Notes

The server processes untrusted external input (documents) and relies on FFI bindings to a Rust core, as well as external tools like LibreOffice and Tesseract. The codebase demonstrates strong awareness of security concerns, including explicit validators for common vulnerabilities like zip bombs, XML entity expansion, and string growth limits. Input validation is performed before crossing FFI boundaries. However, as with any system handling arbitrary external data and exposing APIs (HTTP/MCP), full security depends on proper deployment, network hardening, and potentially additional access control layers by the user. Debug logging in some test files, while not production code, is noted.

Similar Servers

kreuzberg

5412

Extracts text, tables, images, and metadata from 56 file formats including PDF, Office documents, and images. Supports multiple OCR backends, extensible plugins, and is designed for data preprocessing in AI/ML workflows.

Other

$Medium

pdf-reader-mcp

433

Provides a robust server for AI agents to extract text, images, and metadata from PDF documents, preserving content order for better comprehension.

Other

$High

mineru-tianshu

367

Enterprise-grade AI data preprocessing platform for converting diverse unstructured multi-modal data (documents, images, audio, video, bioinformatics formats) into structured Markdown and JSON formats, leveraging GPU acceleration and a robust task management system with user authentication and MCP protocol integration.

Other

$High

flexible-graphrag

The Flexible GraphRAG MCP Server integrates document processing, knowledge graph building, hybrid search, and AI query capabilities via the Model Context Protocol (MCP) for clients like Claude Desktop and MCP Inspector.

Other

$High

Stats

Interest Score100

Security Score8

Cost ClassMedium

Avg Tokens5000

Stars5420

Forks216

Last Update2026-01-19

kreuzberg

Overview

Installation

Environment Variables

Security Notes

Similar Servers

kreuzberg

pdf-reader-mcp

mineru-tianshu

flexible-graphrag

Stats

Tags