Back to Home
walksoda icon

crawl-mcp

Verified Safe

by walksoda

Overview

A comprehensive Model Context Protocol (MCP) server that wraps the crawl4ai library for advanced web crawling, content extraction, and AI-powered summarization from various sources including web pages, PDFs, Office documents, and YouTube videos.

Installation

Run Command
uvx --from git+https://github.com/walksoda/crawl-mcp crawl-mcp

Environment Variables

  • FASTMCP_LOG_LEVEL
  • PYTHONUNBUFFERED
  • CRAWL4AI_BROWSER_TYPE
  • CRAWL4AI_HEADLESS
  • PLAYWRIGHT_BROWSERS_PATH
  • DISPLAY
  • CHROME_FLAGS
  • CRAWL4AI_LANG
  • OPENAI_API_KEY
  • ANTHROPIC_API_KEY
  • AZURE_OPENAI_API_KEY
  • AZURE_OPENAI_ENDPOINT
  • MCP_TRANSPORT
  • MCP_HOST
  • MCP_PORT
  • CRAWL4AI_VERBOSE
  • PLAYWRIGHT_SKIP_BROWSER_GC

Security Notes

The project demonstrates awareness of security, including safeguards against ReDoS attacks using `_safe_regex_findall` with process-level timeouts, and secure file permissions (`0600`) for session/cache data. Environment variables are used for sensitive data like API keys. The `execute_js` parameter for crawling tools is powerful and, if misused by the client, could potentially execute arbitrary JavaScript within the browser context (though contained by Playwright/Chromium sandbox). The use of `--no-sandbox` in Docker Compose is a common practice for Playwright in containers but means reliance on Docker's isolation for browser sandboxing. Session data is stored in plaintext locally, albeit with restricted file permissions, posing a minor risk if the host system is compromised.

Similar Servers

Stats

Interest Score41
Security Score7
Cost ClassHigh
Avg Tokens2500
Stars24
Forks6
Last Update2026-01-18

Tags

Web CrawlingContent ExtractionAI SummarizationGoogle SearchYouTube ProcessingFile ProcessingMCP ServerPlaywrightLLM Integration