crawl-mcp
by walksoda
Overview
A comprehensive Model Context Protocol (MCP) server that wraps the crawl4ai library to extract and analyze content from diverse sources like web pages, PDFs, Office documents, and YouTube videos, featuring AI-powered summarization.
Installation
uvx --from git+https://github.com/walksoda/crawl-mcp crawl-mcpEnvironment Variables
- FASTMCP_LOG_LEVEL
- PYTHONUNBUFFERED
- CRAWL4AI_BROWSER_TYPE
- CRAWL4AI_HEADLESS
- PLAYWRIGHT_BROWSERS_PATH
- DISPLAY
- CHROME_FLAGS
- CRAWL4AI_LANG
- PYTHONPATH
- PYTHONWARNINGS
- CRAWL4AI_VERBOSE
- PLAYWRIGHT_SKIP_BROWSER_GC
- MCP_TRANSPORT
- MCP_HOST
- MCP_PORT
- OPENAI_API_KEY
- ANTHROPIC_API_KEY
- AZURE_OPENAI_API_KEY
- AZURE_OPENAI_ENDPOINT
- GOOGLE_API_KEY
- DEBUG
- PYTEST_CURRENT_TEST
Security Notes
The `execute_js` parameter in crawling tools allows arbitrary JavaScript execution, posing a significant risk for XSS or data exfiltration if the server is exposed to untrusted inputs or URLs. The `SessionManager` stores session data (including cookies) in plaintext on disk, which is vulnerable if the host machine is compromised. The `pure_streamable_http_server.py` uses `Access-Control-Allow-Origin: *`, which is overly permissive for CORS in a production environment. LLM-based tools may also be susceptible to prompt injection vulnerabilities.
Similar Servers
DevDocs
DevDocs is a web crawling and content extraction platform designed to accelerate software development by converting documentation into LLM-ready formats for intelligent data querying and fine-tuning.
mcp-omnisearch
Provides a unified interface for various search, AI response, content processing, and enhancement tools via Model Context Protocol (MCP).
mcp-server
A Model Context Protocol (MCP) server that integrates with SerpApi to provide comprehensive search engine results and data extraction to an LLM.
webscraping-ai-mcp-server
Integrates with WebScraping.AI to provide LLM-powered web data extraction, including question answering, structured data extraction, and HTML/text retrieval, with advanced features like JavaScript rendering and proxy management.