mcp-content-extractor
by micheleboni
Overview
A powerful multi-tool agent server for the Model Context Protocol (MCP), enabling sophisticated RAG operations, web crawling, file processing, and code analysis through a dynamically loaded middleware architecture.
Installation
node dist/index.jsEnvironment Variables
- YOUTUBE_API_KEY
- OLLAMA_URL
- EMBEDDING_MODEL
- MCP_VISION_MODEL
- BROWSER_HEADLESS
- MIDDLEWARE_LOAD
- SERVER_PORT
- SERVER_HOST
Security Notes
CRITICAL: The server exposes an `/mcp/execute` endpoint without any authentication. Several middleware components, such as `run_in_terminal`, `pdf_image_extraction` (using `pdftoppm`, `convert`, `gs`), `extract_youtube_transcript` (using `yt-dlp`), and `mcp-io-github-git-push-files`/`mcp-io-github-git-create-or-update-file` (using `git` commands), directly execute shell commands with input that can originate from API calls. This creates a severe remote code execution (RCE) vulnerability if exposed to untrusted networks or users. Extensive local filesystem read/write access is also granted via `vscode-file` tools, posing data integrity and confidentiality risks. Running this server in a non-sandboxed environment or with untrusted inputs is highly dangerous.
Similar Servers
RIMCP
An MCP server enabling AI agents to search and browse RimWorld source code and Def definitions for modding and development purposes.
viberag
Local codebase semantic search (RAG) for AI coding assistants via MCP server.
tenets
Provides intelligent, token-optimized code context and automatically injects guiding principles to AI coding assistants for enhanced understanding and consistent interactions.
lyra-tool-discovery
This MCP server is designed to fetch, parse, and organize documentation from websites implementing the llms.txt standard. It transforms raw documentation into structured, agent-ready formats, exposing tools for AI agents, LLMs, and automation workflows to consume documentation programmatically.