Skip to content

mcp-bench

Latency & throughput benchmarking for MCP servers, with optional perf budgets for CI.

Point it at a server (local or hosted), pick the calls to exercise, and it reports p50/p95/p99 latency, throughput, and error rate per operation — and fails the build if you blow a budget.

operation calls err p50 p95 p99 max rps
tools/list 200 0% 4.4ms 8.6ms 12.3ms 14.6ms 792
tool:echo 200 0% 0.9ms 1.2ms 1.5ms 1.5ms 4438
✓ within budget

It reuses @mcp-query/contract’s connect path, so it benchmarks local (stdio) or hosted (Streamable HTTP / OAuth) servers with the same flags.

Terminal window
# local server, a tool call, some load, and a p95 budget
npx tsx packages/mcp-bench/src/cli.ts \
--command npx --args "-y @modelcontextprotocol/server-everything" \
--call 'echo:{"message":"hi"}' \
--concurrency 4 --iterations 200 --warmup 5 \
--max-p95 250 --max-error-rate 0
# hosted server (reuses cached OAuth from `mcp-contract auth`)
npx tsx packages/mcp-bench/src/cli.ts --url https://host/mcp --duration 10
Flag Meaning
--command / --args / --url target (stdio or Streamable HTTP); --bearer / --header for auth
--call name:json benchmark a tool call (repeatable)
--read-only also benchmark every read-only tool that takes no required args
--concurrency N parallel callers per op (default 1)
--iterations N calls per op (default 20) — or --duration S for time-boxed
--warmup N untimed calls before measuring (default 3)
--max-p95 MS / --max-error-rate R budgets; exit non-zero if exceeded

By default it benchmarks tools/list plus any --call ops. Destructive tools are never hammered automatically--read-only only adds tools annotated readOnlyHint with no required inputs.

Benchmarking sends real traffic. Against a hosted server that’s real load on someone else’s infrastructure — mind rate limits and terms of service. Defaults are deliberately conservative (concurrency 1, 20 iterations).

import { benchmark, evaluateReport } from "@mcp-query/bench";
const report = await benchmark(
[{ label: "tools/list", invoke: () => client.listTools() }],
{ concurrency: 8, durationMs: 5000, warmup: 3 },
);
const { text, passed } = evaluateReport(report, { maxP95: 200, maxErrorRate: 0 });
console.log(text);
if (!passed) process.exit(1);

benchmark(ops, opts) is generic over an invoke thunk (MCP-agnostic and deterministically testable); summarize(samples) and evaluateReport(report, budget) are exported too.

Project Role
mcp-query consume MCP
mcp-gate govern at runtime
mcp-contract guard the interface in CI (drift)
mcp-lint lint surface quality in CI
mcp-docs generate reference docs
mcp-bench benchmark latency/throughput + perf budgets
mcp-record freeze real traffic as fixtures
Terminal window
npx vitest run # percentiles, worker-pool iteration/error accounting, a real-client bench, budget eval

MVP (private: true). Roadmap: latency histograms / sparklines, warmup-vs-measured split, JSON output, and a regression mode (compare two runs).