Back to Home
kreuzberg-dev icon

kreuzberg

Verified Safe

by kreuzberg-dev

Overview

Extracts text, tables, images, and metadata from 56 file formats including PDF, Office documents, and images. Supports multiple OCR backends, extensible plugins, and is designed for data preprocessing in AI/ML workflows.

Installation

Run Command
python -c "import asyncio; from kreuzberg import extract_file, ExtractionConfig; async def main(): config = ExtractionConfig(use_cache=True, enable_quality_processing=True); result = await extract_file('document.pdf', config=config); print(result.content); asyncio.run(main())"

Environment Variables

  • KREUZBERG_LOG_LEVEL
  • KREUZBERG_CONFIG_PATH
  • TESSDATA_PREFIX
  • KREUZBERG_ENCODING_CACHE_MAX_ENTRIES
  • KREUZBERG_ENCODING_CACHE_MAX_BYTES
  • CORS_ALLOW_ORIGINS
  • LISTEN_ADDR
  • LISTEN_PORT

Security Notes

The core Rust library implements robust security features, including explicit data validation (e.g., zip bomb, XML entity expansion prevention) and graceful panic handling at FFI boundaries. External process execution (e.g., LibreOffice for older Office formats) introduces an inherent risk, but the project appears to be aware and implements safeguards. No direct 'eval' or obvious hardcoded sensitive credentials were found in the provided code snippets. Overall, the project demonstrates a strong focus on secure processing of potentially untrusted inputs, but risks associated with native code execution and external dependencies should always be considered.

Similar Servers

Stats

Interest Score100
Security Score8
Cost ClassMedium
Avg Tokens500
Stars5412
Forks216
Last Update2026-01-18

Tags

Document ProcessingOCRPDF ExtractionText AnalysisMultilingual