|
cMCP 0.4.1
Model Context Protocol library in pure C11
|
cMCP's CI lane runs make bench (three workloads — stdio inline, stdio worker-pool, HTTP inline) eleven times and takes the per-metric median, then diffs against bench/baseline.json. A PR that regresses a gated metric past its tolerance band fails this check and gets a comment showing the delta table.
| Workload | Metric | Direction | Default tolerance |
|---|---|---|---|
server_inline_stdio | throughput_per_s | higher_is_better | ±25 % |
server_inline_stdio | p99_us | lower_is_better | ±40 % |
server_pool_stdio | wall_ms | lower_is_better | ±20 % |
server_inline_http | throughput_per_s | higher_is_better | ±30 % |
server_inline_http | p99_us | lower_is_better | ±40 % |
Per-metric tolerance bands are wider for HTTP (a real socket + a real syscall round-trip per call) than for stdio (in-process pipes), and wider for tail latency than for throughput (the tail is what jitter hits hardest). Defaults are tuned for the noise floor of GitHub Actions ubuntu-latest shared runners — see Risk: CI noise below.
For each metric:
current is the median of N=11 runs, computed by bench/compare-baseline.sh. baseline lives in bench/baseline.json.
The gate is binary per metric. If any gated metric fails, the job exits non-zero and the PR check goes red.
GitHub Actions shared runners have documented ±15-30 % latency jitter on small workloads, which is wider than the regressions we want to catch. Mitigations stacked:
[skip-bench] opt-out** — the escape valve when the noise wins despite the above, or when an intentional perf hit lands.If the gate becomes flaky (frequent false positives on no-op PRs), the answer is to widen the offending metric's tolerance, not to drop the gate. Calibration is part of operating the gate.
A self-hosted bare-metal runner would let us tighten the bands to ±10 % and catch much smaller regressions. That's the Tier 8 follow- up if traffic justifies the lab cost.
bench/baseline.json is committed — it is not auto-updated by CI. This is deliberate (Tier 7 open question 2 in TODO.md): explicit beats inferred. A PR that intentionally changes performance comes in two commits:
bench/baseline.json updated with new numbers. PR description references commit 1 and explains why the new numbers are the right floor (e.g. "+0.5 µs/call to validate
schemas server-side — buys -32602 spec compliance, see <issue>").This makes baseline changes reviewable: every drop in expectations shows up in the diff with a paper trail. By contrast, an auto-updating baseline would silently absorb slow leaks — exactly what Tier 7 is guarding against.
For a one-off where you've confirmed the perf delta is intentional and you don't want to bump the baseline in the same PR, add [skip-bench] to the HEAD commit subject. CI reads git log -1 --pretty=s, sets SKIP_BENCH=1, and the compare script emits a "Skipped" table and exits 0.
Don't lean on this — the gate exists for a reason, and skipped PRs don't refresh the baseline.
Exit code is the gate verdict; bench/delta.md is what CI would post. Note that local results vary by ±5x from CI numbers (your laptop is faster than GitHub Actions shared runners), so a delta against the committed baseline is meaningful only at CI parity — locally, use the script to check that the shape of the output is right (no NaNs, no missing metrics) and the direction matches what your change should produce.
bench/run.sh CSV header + the producing bench binary.bench/compare-baseline.sh's col_for().metrics.<workload> in bench/baseline.json with direction and tolerance.tolerance if the per-run noise turns out wider than expected.bench measures latency + throughput, not RSS. Heap leaks are caught by valgrind + sanitisers (make test-asan, make valgrind).tests/soak/ family covers that (Tier 7.3); soak measures change over hours, this gate measures steady-state cost per call.docs/perf-baselines.md are the absolute targets, but they live in narrative prose, not in CI.bench/run.sh — produces bench/results.csvbench/compare-baseline.sh — the median-of-N + diff comparatorbench/baseline.json — the floor