semango
by omarkamali
Overview
A hybrid search engine for codebases, documentation, and knowledge bases, combining lexical (BM25) and semantic (vector) search with a web UI and REST API.
Installation
docker run -p 8181:8181 -v $(pwd):/data ghcr.io/omarkamali/semango:latestEnvironment Variables
- SEMANGO_TOKENS
- OPENAI_API_KEY
- SEMANGO_ENV_FILE
- SEMANGO_MODEL_DIR
Security Notes
CRITICAL VULNERABILITIES IDENTIFIED: 1. Authentication Bypass: The REST API (`/api/v1/search`, `/api/v1/health`, `/api/v1/stats`) is advertised as 'Token-authenticated' in the README, but the provided `api/server.go` code *does not implement any authentication middleware*. Requests from the `ui/App.tsx` also do not include an `Authorization` header. This means the API is publicly accessible without any token, which is a critical security vulnerability allowing unauthorized access to search and statistics. This server is unsafe to run in any exposed environment. 2. Arbitrary Code Execution (Plugins): The system supports dynamic loading of plugins (`.so` shared object files) specified in `semango.yml`. This feature allows arbitrary native code execution and presents a severe security risk if plugins are sourced from untrusted origins or if the configuration can be tampered with by an attacker. 3. External Binary Downloads: Installation scripts (`install_faiss.sh`, `install_onnxruntime.sh`) use `curl -L` to download external libraries (FAISS, ONNX Runtime) from GitHub releases. While GitHub is generally trusted, this introduces a dependency on external integrity and can be a supply chain risk. 4. CGO Usage: The project extensively uses CGO for FAISS and ONNX Runtime. This exposes the system to potential C/C++ vulnerabilities (e.g., buffer overflows, memory corruption) that Go's memory safety features typically mitigate. Careful review of CGO-bound code is essential.
Similar Servers
chunkhound
ChunkHound transforms codebases into searchable knowledge bases for AI assistants, enabling deep semantic and regex-based code research.
codegraph-rust
CodeGraph transforms codebases into a semantically searchable knowledge graph, enabling AI agents to reason about code architecture, dependencies, and patterns using specialized agentic tools.
semantic-code-search-mcp-server
Provides a Model Context Protocol (MCP) server that exposes indexed code data through a set of tools, allowing AI coding agents to interact with and understand codebases in a structured way.
kgraph
Indexes codebases into a knowledge graph to enable semantic search, precise code navigation, and impact analysis for LLM agents.