SRE-agent
Verified Safeby martinimarcello00
Overview
An autonomous multi-agent system designed for Kubernetes incident detection, diagnosis, and mitigation using LLMs and modular workflows to reduce Mean Time to Resolution (MTTR).
Installation
cd sre-agent && poetry run langgraph devEnvironment Variables
- OPENAI_API_KEY
- CHROMADB_STORAGE_PATH
- PROMETHEUS_SERVER_URL
- TELEGRAM_BOT_TOKEN
- TELEGRAM_CHAT_ID
- MAX_TOOL_CALLS
- LANGSMITH_API_KEY
- AIOPSLAB_DIR
- TRACE_SERVICE_STARTING_POINT
Security Notes
The system interacts with critical infrastructure (Kubernetes, Prometheus, Jaeger, ChromaDB) via MCP clients. While the Kubernetes MCP client enforces `ALLOW_ONLY_NON_DESTRUCTIVE_TOOLS: true`, the overall agent has significant control over the environment. External dependencies like `npx mcp-server-kubernetes` and `uvx chroma-mcp` introduce reliance on external packages and runtimes (Node.js, Python). Environment variables are used for sensitive data (API keys, tokens), which is a good practice, but proper secret management (e.g., Kubernetes Secrets, Vault) for these environment variables is essential outside the scope of this code review.
Similar Servers
inspector
A web-based client and proxy server for inspecting and interacting with Model Context Protocol (MCP) servers, allowing users to browse resources, prompts, and tools, perform requests, and debug OAuth authentication flows.
mcp-grafana
Provides a Model Context Protocol (MCP) server for Grafana, enabling AI agents to interact with Grafana features such as dashboards, datasources, alerting, incidents, and more through a structured tool-based interface.
kubernetes-mcp-server
Facilitates AI agent interaction with Kubernetes and OpenShift clusters by exposing management and observability tools via the Model Context Protocol.
mcp-k8s-go
This MCP server enables interaction with Kubernetes clusters to list, get, apply, and execute commands on various resources through a conversational interface.