Back to Home
ArangoGutierrez icon

k8s-gpu-mcp-server

Verified Safe

by ArangoGutierrez

Overview

Provides just-in-time, real-time NVIDIA GPU hardware introspection for Kubernetes clusters for AI-assisted SRE troubleshooting.

Installation

Run Command
npx k8s-gpu-mcp-server@latest

Environment Variables

  • K8S_GPU_MCP_NAMESPACE
  • K8S_GPU_MCP_SERVICE
  • K8S_GPU_MCP_CONTEXT
  • K8S_GPU_MCP_SERVICE_PORT
  • K8S_GPU_MCP_LOCAL_PORT
  • KUBECONFIG

Security Notes

The project prioritizes security with detailed RBAC configurations, security contexts (e.g., `readOnlyRootFilesystem: true`, `allowPrivilegeEscalation: false`), and optional NetworkPolicies. The default agent mode is `read-only`. The agent container requires `runAsUser: 0` (root) for NVML access, which is explicitly justified but inherently carries higher privilege. Access to `/dev/kmsg` for XID error analysis requires `CAP_SYSLOG` and potentially `privileged: true`. The gateway supports `kubectl exec` routing as a fallback, which can be less secure than direct HTTP if not properly constrained, but the recommended `HTTP` routing mode mitigates this. Comprehensive documentation on the security model and verification steps is provided.

Similar Servers

Stats

Interest Score35
Security Score8
Cost ClassMedium
Avg Tokens1500
Stars2
Forks2
Last Update2026-01-19

Tags

nvidiagpukubernetesdiagnosticsAI-assisted troubleshooting