data-commons-search
by EOSC-Data-Commons
Overview
Provides a natural language search interface over open-access datasets, leveraging Large Language Models (LLMs) and the Model Context Protocol (MCP) to assist users in discovering relevant data and tools for scientific research.
Installation
docker compose upEnvironment Variables
- OPENSEARCH_URL
- EINFRACZ_API_KEY
- MISTRAL_API_KEY
- OPENROUTER_API_KEY
- SEARCH_API_KEY
- SERVER_PORT
- SERVER_HOST
Security Notes
1. `cors_enabled: True` with `allow_origins=['*']` is set, allowing requests from any origin. This is a significant security risk for a service handling sensitive API keys or data, making it vulnerable to CSRF and other cross-origin attacks. 2. The `chat_api_key` environment variable is optional. If not set, the `/chat` endpoint can be accessed without authentication, potentially leading to abuse and high LLM costs. 3. The `search_data` tool uses `query_string` with wildcards for `creator_name`. Without explicit sanitization of user input, this could be vulnerable to Lucene query injection, potentially leading to unexpected search results, denial of service, or information leakage. 4. Conversation logs are written to `data/logs/conversations.jsonl`. If Personally Identifiable Information (PII) is included in user queries or LLM responses, it would be stored unencrypted, posing a privacy risk. 5. The frontend utilizes `dangerouslySetInnerHTML` in some components. If the backend (e.g., from OpenSearch descriptions or LLM summaries) provides unsanitized HTML content, it could lead to Cross-Site Scripting (XSS) vulnerabilities in the user interface.
Similar Servers
mcp-server-elasticsearch
Enables AI clients to interact with Elasticsearch data through natural language conversations using the Model Context Protocol (MCP) by exposing a set of predefined tools.
mcp-omnisearch
Provides a unified interface for LLMs to access multiple web search, AI response, content processing, and enhancement tools from various providers through the Model Context Protocol (MCP).
mcp-server
Provides a Model Context Protocol (MCP) server for AI agents to search and retrieve curated documentation for the Strands Agents framework, facilitating AI coding assistance.
opensearch-mcp-server-py
Enables AI assistants to interact with OpenSearch clusters, providing a standardized interface for search, mapping, and shard management.