Back to Home
Goldziher icon

kreuzberg

Verified Safe

by Goldziher

Overview

Extracts text, tables, images, and metadata from a wide range of document formats (PDF, Office, images, HTML, etc.), with support for multiple OCR backends and an extensible plugin system. Can be run as a Micro-Agent Communication Protocol (MCP) server.

Installation

Run Command
kreuzberg mcp

Environment Variables

  • KREUZBERG_CONFIG_PATH
  • KREUZBERG_OCR_BACKEND
  • KREUZBERG_OCR_LANGUAGE
  • KREUZBERG_CHUNK_MAX_CHARS
  • KREUZBERG_CHUNK_MAX_OVERLAP
  • KREUZBERG_USE_CACHE
  • KREUZBERG_TOKEN_REDUCTION_MODE
  • KREUZBERG_ENCODING_CACHE_MAX_ENTRIES
  • KREUZBERG_ENCODING_CACHE_MAX_BYTES
  • KREUZBERG_API_HOST
  • KREUZBERG_API_PORT
  • KREUZBERG_API_CORS_ORIGINS
  • KREUZBERG_API_MAX_REQUEST_BODY_BYTES
  • KREUZBERG_API_MAX_MULTIPART_FIELD_BYTES
  • KREUZBERG_API_MAX_UPLOAD_MB

Security Notes

The server processes untrusted external input (documents) and relies on FFI bindings to a Rust core, as well as external tools like LibreOffice and Tesseract. The codebase demonstrates strong awareness of security concerns, including explicit validators for common vulnerabilities like zip bombs, XML entity expansion, and string growth limits. Input validation is performed before crossing FFI boundaries. However, as with any system handling arbitrary external data and exposing APIs (HTTP/MCP), full security depends on proper deployment, network hardening, and potentially additional access control layers by the user. Debug logging in some test files, while not production code, is noted.

Similar Servers

Stats

Interest Score100
Security Score8
Cost ClassMedium
Avg Tokens5000
Stars5420
Forks216
Last Update2026-01-19

Tags

document extractionOCRPDF processingOffice documentsmetadata extractiontext processingdata parsingmultilingualembeddingsplugins