Back to Home
EOSC-Data-Commons icon

data-commons-search

by EOSC-Data-Commons

Overview

Provides a natural language search interface over open-access datasets, leveraging Large Language Models (LLMs) and the Model Context Protocol (MCP) to assist users in discovering relevant data and tools for scientific research.

Installation

Run Command
docker compose up

Environment Variables

  • OPENSEARCH_URL
  • EINFRACZ_API_KEY
  • MISTRAL_API_KEY
  • OPENROUTER_API_KEY
  • SEARCH_API_KEY
  • SERVER_PORT
  • SERVER_HOST

Security Notes

1. `cors_enabled: True` with `allow_origins=['*']` is set, allowing requests from any origin. This is a significant security risk for a service handling sensitive API keys or data, making it vulnerable to CSRF and other cross-origin attacks. 2. The `chat_api_key` environment variable is optional. If not set, the `/chat` endpoint can be accessed without authentication, potentially leading to abuse and high LLM costs. 3. The `search_data` tool uses `query_string` with wildcards for `creator_name`. Without explicit sanitization of user input, this could be vulnerable to Lucene query injection, potentially leading to unexpected search results, denial of service, or information leakage. 4. Conversation logs are written to `data/logs/conversations.jsonl`. If Personally Identifiable Information (PII) is included in user queries or LLM responses, it would be stored unencrypted, posing a privacy risk. 5. The frontend utilizes `dangerouslySetInnerHTML` in some components. If the backend (e.g., from OpenSearch descriptions or LLM summaries) provides unsanitized HTML content, it could lead to Cross-Site Scripting (XSS) vulnerabilities in the user interface.

Similar Servers

Stats

Interest Score38
Security Score4
Cost ClassHigh
Avg Tokens3000
Stars9
Forks1
Last Update2025-11-28

Tags

AILLMSearchDatasetsMCP