mcp-datahub
Verified Safeby txn2
Overview
Connects AI assistants to DataHub metadata catalogs for searching datasets, exploring schemas, tracing lineage, and accessing glossary terms and domains. It can be used as a standalone server or as a composable Go library for custom MCP servers with advanced features.
Installation
docker run -e DATAHUB_URL=https://datahub.company.com -e DATAHUB_TOKEN=your_token ghcr.io/txn2/mcp-datahub:latestEnvironment Variables
- DATAHUB_URL
- DATAHUB_TOKEN
- DATAHUB_TIMEOUT
- DATAHUB_RETRY_MAX
- DATAHUB_DEFAULT_LIMIT
- DATAHUB_MAX_LIMIT
- DATAHUB_MAX_LINEAGE_DEPTH
- DATAHUB_CONNECTION_NAME
- DATAHUB_ADDITIONAL_SERVERS
- JWT_SECRET
- AUDIT_DATABASE_URL
- TRINO_HOST
- TRINO_USER
- TRINO_CATALOG
- TRINO_SCHEMA
- TENANT_CONFIG
Security Notes
The project demonstrates strong security practices: tokens are handled via environment variables and explicitly not logged, connections use HTTPS, TLS certificate verification is enabled by default. It provides middleware interfaces for custom access control, audit logging, and rate limiting. Supply chain security is also addressed with SLSA Level 3 provenance and Cosign-signed releases. No 'eval' or obfuscation patterns were found. The tool performs read-only operations on DataHub.
Similar Servers
OpenMetadata
This server acts as a plugin for Apache Airflow, exposing REST APIs to manage OpenMetadata workflow definitions, DAGs, and tasks.
metorial-index
A background service that builds and maintains a comprehensive public catalog of Model Context Protocol (MCP) servers, enriching their metadata through automated fetching from repositories and AI-driven content generation.
powerbi-mcp
Enables AI assistants to interact with Power BI Desktop and Service for querying data, managing models, and performing safe bulk operations through natural language, ensuring enterprise-grade security and preserving report visual integrity during refactoring.
mcp-server-datahub
Enables AI agents to interact with DataHub for comprehensive data discovery, governance, lineage exploration, and SQL query generation across an organization's data ecosystem.