read-docs
by ys319
Overview
Extracts the main readable content from web pages, including those rendered client-side (CSR), and converts it into Markdown format for saving or display.
Installation
./read-docs https://example.com/articleSecurity Notes
The tool utilizes `go-rod` to launch Headless Chrome with the `NoSandbox(true)` flag. While this is often a necessary workaround in containerized environments, running Chrome without its sandbox on a host system can significantly increase the risk of host compromise if processing malicious or untrusted web content that exploits a browser vulnerability. Given the tool's primary function is to fetch arbitrary URLs, this presents a notable security concern if not executed within an isolated environment or with carefully vetted input URLs.
Similar Servers
DevDocs
DevDocs is a web crawling and content extraction platform designed to accelerate software development by converting documentation into LLM-ready formats for intelligent data querying and fine-tuning.
html-to-markdown-mcp
Converts HTML content from web pages or raw strings into Markdown format, with options for including metadata, truncating content, and saving to files.
scrapi-mcp
This MCP server enables AI agents to scrape web pages and retrieve their content as HTML or Markdown, with advanced browser interaction capabilities.
defuddle-fetch-mcp-server
This server allows LLMs to fetch web content, automatically cleaning HTML into markdown, extracting key metadata like title and author, and supporting chunked reading.