semango
by omarkamali
Overview
A hybrid search engine for codebases, documentation, and knowledge bases, combining lexical (BM25) and semantic (vector) search with a web UI and REST API.
Installation
docker run -p 8181:8181 -v $(pwd):/data ghcr.io/omarkamali/semango:latestEnvironment Variables
- SEMANGO_TOKENS
- OPENAI_API_KEY
- SEMANGO_ENV_FILE
- SEMANGO_MODEL_DIR
Security Notes
CRITICAL VULNERABILITIES IDENTIFIED: 1. Authentication Bypass: The REST API (`/api/v1/search`, `/api/v1/health`, `/api/v1/stats`) is advertised as 'Token-authenticated' in the README, but the provided `api/server.go` code *does not implement any authentication middleware*. Requests from the `ui/App.tsx` also do not include an `Authorization` header. This means the API is publicly accessible without any token, which is a critical security vulnerability allowing unauthorized access to search and statistics. This server is unsafe to run in any exposed environment. 2. Arbitrary Code Execution (Plugins): The system supports dynamic loading of plugins (`.so` shared object files) specified in `semango.yml`. This feature allows arbitrary native code execution and presents a severe security risk if plugins are sourced from untrusted origins or if the configuration can be tampered with by an attacker. 3. External Binary Downloads: Installation scripts (`install_faiss.sh`, `install_onnxruntime.sh`) use `curl -L` to download external libraries (FAISS, ONNX Runtime) from GitHub releases. While GitHub is generally trusted, this introduces a dependency on external integrity and can be a supply chain risk. 4. CGO Usage: The project extensively uses CGO for FAISS and ONNX Runtime. This exposes the system to potential C/C++ vulnerabilities (e.g., buffer overflows, memory corruption) that Go's memory safety features typically mitigate. Careful review of CGO-bound code is essential.
Similar Servers
chunkhound
Provides local-first codebase intelligence, extracting architecture, patterns, and institutional knowledge for AI assistants.
qdrant-mcp-server
This server provides semantic search capabilities using Qdrant vector database, primarily focused on code vectorization for intelligent codebase indexing and semantic code search, as well as general document search.
memex
Personal knowledge base with hybrid search (keyword + semantic) and LLM-driven memory evolution, designed for agent workflows.
semantic-code-search-mcp-server
This MCP server exposes indexed code data to AI coding agents, enabling structured interaction for codebase understanding, code discovery, symbol analysis, and file content reconstruction.