kreuzberg

Name: kreuzberg
Author: kreuzberg-dev

Verified Safe

by kreuzberg-dev

View Source

Overview

Extracts text, tables, images, and metadata from 56 file formats including PDF, Office documents, and images. Supports multiple OCR backends, extensible plugins, and is designed for data preprocessing in AI/ML workflows.

Installation

Run Command

python -c "import asyncio; from kreuzberg import extract_file, ExtractionConfig; async def main(): config = ExtractionConfig(use_cache=True, enable_quality_processing=True); result = await extract_file('document.pdf', config=config); print(result.content); asyncio.run(main())"

Environment Variables

KREUZBERG_LOG_LEVEL
KREUZBERG_CONFIG_PATH
TESSDATA_PREFIX
KREUZBERG_ENCODING_CACHE_MAX_ENTRIES
KREUZBERG_ENCODING_CACHE_MAX_BYTES
CORS_ALLOW_ORIGINS
LISTEN_ADDR
LISTEN_PORT

Security Notes

The core Rust library implements robust security features, including explicit data validation (e.g., zip bomb, XML entity expansion prevention) and graceful panic handling at FFI boundaries. External process execution (e.g., LibreOffice for older Office formats) introduces an inherent risk, but the project appears to be aware and implements safeguards. No direct 'eval' or obvious hardcoded sensitive credentials were found in the provided code snippets. Overall, the project demonstrates a strong focus on secure processing of potentially untrusted inputs, but risks associated with native code execution and external dependencies should always be considered.

Similar Servers

kreuzberg

5420

Extracts text, tables, images, and metadata from a wide range of document formats (PDF, Office, images, HTML, etc.), with support for multiple OCR backends and an extensible plugin system. Can be run as a Micro-Agent Communication Protocol (MCP) server.

Other

$Medium

pdf-reader-mcp

433

Provides a robust server for AI agents to extract text, images, and metadata from PDF documents, preserving content order for better comprehension.

Other

$High

pdflens-mcp

This MCP server provides tools for reading and extracting information from PDF files, including text and images, designed for AI clients.

Other

$High

lyra-tool-discovery

This MCP server is designed to fetch, parse, and organize documentation from websites implementing the llms.txt standard. It transforms raw documentation into structured, agent-ready formats, exposing tools for AI agents, LLMs, and automation workflows to consume documentation programmatically.

Other

$Low

Stats

Interest Score100

Security Score8

Cost ClassMedium

Avg Tokens500

Stars5412

Forks216

Last Update2026-01-18

kreuzberg

Overview

Installation

Environment Variables

Security Notes

Similar Servers

kreuzberg

pdf-reader-mcp

pdflens-mcp

lyra-tool-discovery

Stats

Tags