Back to Home
martinimarcello00 icon

SRE-agent

Verified Safe

by martinimarcello00

Overview

An autonomous multi-agent system designed for Kubernetes incident detection, diagnosis, and mitigation using LLMs and modular workflows to reduce Mean Time to Resolution (MTTR).

Installation

Run Command
cd sre-agent && poetry run langgraph dev

Environment Variables

  • OPENAI_API_KEY
  • CHROMADB_STORAGE_PATH
  • PROMETHEUS_SERVER_URL
  • TELEGRAM_BOT_TOKEN
  • TELEGRAM_CHAT_ID
  • MAX_TOOL_CALLS
  • LANGSMITH_API_KEY
  • AIOPSLAB_DIR
  • TRACE_SERVICE_STARTING_POINT

Security Notes

The system interacts with critical infrastructure (Kubernetes, Prometheus, Jaeger, ChromaDB) via MCP clients. While the Kubernetes MCP client enforces `ALLOW_ONLY_NON_DESTRUCTIVE_TOOLS: true`, the overall agent has significant control over the environment. External dependencies like `npx mcp-server-kubernetes` and `uvx chroma-mcp` introduce reliance on external packages and runtimes (Node.js, Python). Environment variables are used for sensitive data (API keys, tokens), which is a good practice, but proper secret management (e.g., Kubernetes Secrets, Vault) for these environment variables is essential outside the scope of this code review.

Similar Servers

Stats

Interest Score35
Security Score7
Cost ClassMedium
Avg Tokens69196
Stars6
Forks0
Last Update2026-01-19

Tags

SREKubernetesLLM AgentIncident ResponseLangGraph