Back to Home
walksoda icon

crawl-mcp

by walksoda

Overview

A comprehensive Model Context Protocol (MCP) server that wraps the crawl4ai library to extract and analyze content from diverse sources like web pages, PDFs, Office documents, and YouTube videos, featuring AI-powered summarization.

Installation

Run Command
uvx --from git+https://github.com/walksoda/crawl-mcp crawl-mcp

Environment Variables

  • FASTMCP_LOG_LEVEL
  • PYTHONUNBUFFERED
  • CRAWL4AI_BROWSER_TYPE
  • CRAWL4AI_HEADLESS
  • PLAYWRIGHT_BROWSERS_PATH
  • DISPLAY
  • CHROME_FLAGS
  • CRAWL4AI_LANG
  • PYTHONPATH
  • PYTHONWARNINGS
  • CRAWL4AI_VERBOSE
  • PLAYWRIGHT_SKIP_BROWSER_GC
  • MCP_TRANSPORT
  • MCP_HOST
  • MCP_PORT
  • OPENAI_API_KEY
  • ANTHROPIC_API_KEY
  • AZURE_OPENAI_API_KEY
  • AZURE_OPENAI_ENDPOINT
  • GOOGLE_API_KEY
  • DEBUG
  • PYTEST_CURRENT_TEST

Security Notes

The `execute_js` parameter in crawling tools allows arbitrary JavaScript execution, posing a significant risk for XSS or data exfiltration if the server is exposed to untrusted inputs or URLs. The `SessionManager` stores session data (including cookies) in plaintext on disk, which is vulnerable if the host machine is compromised. The `pure_streamable_http_server.py` uses `Access-Control-Allow-Origin: *`, which is overly permissive for CORS in a production environment. LLM-based tools may also be susceptible to prompt injection vulnerabilities.

Similar Servers

Stats

Interest Score41
Security Score3
Cost ClassHigh
Avg Tokens15000
Stars24
Forks6
Last Update2026-01-17

Tags

Web crawlingContent extractionAI summarizationGoogle SearchYouTube processingDocument processingMCP ServerPlaywright