kreuzberg
Verified Safeby Goldziher
Overview
High-performance document intelligence for extracting text, metadata, and structured information from a wide range of document formats including PDFs, Office documents, images, and HTML. It supports advanced features like OCR, table extraction, chunking, language detection, and embedding generation, powered by a Rust core for native performance.
Installation
kreuzberg mcpEnvironment Variables
- KREUZBERG_BENCHMARK_DEBUG
- KREUZBERG_DEBUG_GUTEN
- KREUZBERG_ENCODING_CACHE_MAX_ENTRIES
- KREUZBERG_ENCODING_CACHE_MAX_BYTES
- TESSDATA_PREFIX
- LD_LIBRARY_PATH
- DYLD_LIBRARY_PATH
- PATH
- NODE_PATH
- PYTHONPATH
- RUBYLIB
- KREUZBERG_FFI_DIR
- KREUZBERG_GMFT_ISOLATED
- DOTPRODUCT
- SCROLLVIEW_PATH
Security Notes
The server includes an API and an MCP (Multi-language Communication Protocol) server designed for inter-process communication. While robust for local and internal use, exposing the API or MCP interface publicly without additional security measures (e.g., authentication, access control, network segmentation) could introduce vulnerabilities. The `biome.json` linter configuration shows some exceptions for `noExplicitAny`, indicating potential areas where type strictness is relaxed, but `pnpm` with a lockfile and `oxlint` are used for dependency and code quality management, respectively. No obvious malicious patterns or hardcoded critical secrets were found in the provided snippets.
Similar Servers
kreuzberg
High-performance document intelligence platform for extracting text, metadata, and structured information (tables, images, chunks) from over 50 diverse document formats (PDFs, Office, images, HTML, etc.). It offers advanced OCR capabilities, multilingual support, and features like chunking, embeddings, and keyword extraction. Functionality is exposed via multiple language bindings and a Micro-service Communication Protocol (MCP) server for flexible integration.
pdf-reader-mcp
Provides production-ready PDF processing capabilities for AI agents, including extraction of text, images, and metadata from local files or URLs.
mcp-documentation-server
A local-first MCP server for document management, semantic search, and AI-powered document intelligence.
mineru-tianshu
An enterprise-grade AI data preprocessing platform that converts unstructured data (documents, images, audio, video, bioinformatics formats) into AI-ready structured Markdown and JSON formats.