scraper-mcp
by jessaminesimple608
Overview
A web scraping MCP server that efficiently extracts content (HTML, Markdown, text, links) from web pages for downstream processing, particularly to reduce LLM token usage.
Installation
docker-compose up -dEnvironment Variables
- TRANSPORT
- HOST
- PORT
- CACHE_DIR
- HTTP_PROXY
- HTTPS_PROXY
- NO_PROXY
- SCRAPEOPS_API_KEY
- SCRAPEOPS_RENDER_JS
- SCRAPEOPS_RESIDENTIAL
- SCRAPEOPS_COUNTRY
- SCRAPEOPS_KEEP_HEADERS
- SCRAPEOPS_DEVICE
- ENABLE_CACHE_TOOLS
Security Notes
The server's administrative API endpoints (e.g., /api/config, /api/cache/clear, /api/stats) lack built-in authentication or authorization. This is a critical vulnerability as it allows any unauthenticated user to modify runtime configuration (e.g., enable proxies, disable SSL verification, change concurrency limits), clear the cache, or retrieve server statistics. While the code itself does not use 'eval' or contain hardcoded secrets, the exposed admin interface poses a significant risk if deployed publicly without external security measures.
Similar Servers
DevDocs
DevDocs is a web crawling and content extraction platform designed to accelerate software development by converting documentation into LLM-ready formats for intelligent data querying and fine-tuning.
mcp-omnisearch
Provides a unified interface for various search, AI response, content processing, and enhancement tools via Model Context Protocol (MCP).
scrapegraph-mcp
Provides AI-powered web scraping, structured data extraction, multi-page crawling, and agentic automation capabilities for language models.
webscraping-ai-mcp-server
Integrates with WebScraping.AI to provide LLM-powered web data extraction, including question answering, structured data extraction, and HTML/text retrieval, with advanced features like JavaScript rendering and proxy management.