Back to Home
ThreeFish-AI icon

data-extractor

Verified Safe

by ThreeFish-AI

Overview

A commercial-grade MCP Server for robust web page and PDF content extraction, localization into Markdown, and long-term deployment in enterprise environments.

Installation

Run Command
uv run data-extractor

Environment Variables

  • DATA_EXTRACTOR_USE_PROXY
  • DATA_EXTRACTOR_PROXY_URL
  • DATA_EXTRACTOR_ENABLE_JAVASCRIPT
  • DATA_EXTRACTOR_BROWSER_HEADLESS
  • DATA_EXTRACTOR_TRANSPORT_MODE
  • DATA_EXTRACTOR_HTTP_HOST
  • DATA_EXTRACTOR_HTTP_PORT

Security Notes

The project handles network requests for web scraping and PDF downloading, which inherently carries risks if not used responsibly. The documentation explicitly advises users to comply with `robots.txt` and website terms of use. Features like stealth scraping and form submission can be powerful but require careful usage to avoid ethical or legal issues. There are no obvious hardcoded secrets or 'eval' usage, and configuration uses environment variables for sensitive data. Proper proxy configuration and responsible usage are key for security.

Similar Servers

Stats

Interest Score32
Security Score8
Cost ClassHigh
Avg Tokens5000
Stars2
Forks2
Last Update2025-12-02

Tags

web-scraperpdf-convertermarkdowndata-extractionenterprisefastmcp