crawl-mcp
Verified Safeby walksoda
Overview
A comprehensive Model Context Protocol (MCP) server that wraps the crawl4ai library for advanced web crawling, content extraction, and AI-powered summarization from various sources including web pages, PDFs, Office documents, and YouTube videos.
Installation
uvx --from git+https://github.com/walksoda/crawl-mcp crawl-mcpEnvironment Variables
- FASTMCP_LOG_LEVEL
- PYTHONUNBUFFERED
- CRAWL4AI_BROWSER_TYPE
- CRAWL4AI_HEADLESS
- PLAYWRIGHT_BROWSERS_PATH
- DISPLAY
- CHROME_FLAGS
- CRAWL4AI_LANG
- OPENAI_API_KEY
- ANTHROPIC_API_KEY
- AZURE_OPENAI_API_KEY
- AZURE_OPENAI_ENDPOINT
- MCP_TRANSPORT
- MCP_HOST
- MCP_PORT
- CRAWL4AI_VERBOSE
- PLAYWRIGHT_SKIP_BROWSER_GC
Security Notes
The project demonstrates awareness of security, including safeguards against ReDoS attacks using `_safe_regex_findall` with process-level timeouts, and secure file permissions (`0600`) for session/cache data. Environment variables are used for sensitive data like API keys. The `execute_js` parameter for crawling tools is powerful and, if misused by the client, could potentially execute arbitrary JavaScript within the browser context (though contained by Playwright/Chromium sandbox). The use of `--no-sandbox` in Docker Compose is a common practice for Playwright in containers but means reliance on Docker's isolation for browser sandboxing. Session data is stored in plaintext locally, albeit with restricted file permissions, posing a minor risk if the host system is compromised.
Similar Servers
DevDocs
DevDocs is a web crawling and content extraction platform designed to accelerate software development by converting documentation into LLM-ready formats for intelligent data querying and fine-tuning.
mcp-omnisearch
Provides a unified interface for various search, AI response, content processing, and enhancement tools via Model Context Protocol (MCP).
mcp-server
A Model Context Protocol (MCP) server that integrates with SerpApi to provide comprehensive search engine results and data extraction to an LLM.
webscraping-ai-mcp-server
Integrates with WebScraping.AI to provide LLM-powered web data extraction, including question answering, structured data extraction, and HTML/text retrieval, with advanced features like JavaScript rendering and proxy management.