Back to Home
greynewell icon

mcpbr

Verified Safe

by greynewell

Overview

A benchmark runner for evaluating Model Context Protocol (MCP) servers and LLM-based coding agents against real-world software engineering tasks (SWE-bench).

Installation

Run Command
mcpbr run -c mcpbr.yaml

Environment Variables

  • ANTHROPIC_API_KEY
  • SUPERMODEL_API_KEY

Security Notes

The tool runs code within isolated Docker containers, which significantly reduces host system risks. It processes SWE-bench dataset inputs using `ast.literal_eval`, which is safe for trusted data. It uses the Claude Code CLI with `--dangerously-skip-permissions` inside the container for agent functionality, which is acceptable within its sandbox. Users must ensure their Docker daemon is secure and only use trusted MCP server implementations as they define the external server execution.

Similar Servers

Stats

Interest Score55
Security Score9
Cost ClassMedium
Avg Tokens7000
Stars1
Forks1
Last Update2026-01-17

Tags

BenchmarkLLMAgentsSoftware EngineeringEvaluation