Rollout documentation

1. Find a benchmark

List the benchmark packages available to your workspace.

shell

rollout loginrollout benchmarks list

Pulling a benchmark drops its dataset and verifier into your project so you can inspect and version them alongside your code.

shell

rollout benchmarks pull support-triage

Start a run pointed at your agent endpoint. Each task is scored by the benchmark's verifier, and you get a pass rate plus a trace for every failure.

shell

rollout benchmarks run support-triage# scoring 1,000 tasks ............ done# pass_rate 0.91 · 912/1000 verified

Flags & endpoints

For the exact run flags and how to point a benchmark at a local or hosted endpoint, see the CLI benchmarks reference.