1. Write the spec
List your checks with a weight each and a passThreshold. Mix deterministic checks (a tool was called, the output is valid JSON, a field matches) with an LLM judge for the fuzzy stuff.
{ "checks": [ { "type": "tool_called", "name": "issue_refund", "weight": 1.0 }, { "type": "json_valid", "weight": 1.0 }, { "type": "field_equals", "path": "refund.amount", "equals": "order.total", "weight": 1.0 }, { "type": "llm_judge", "prompt": "Is the tone polite and clear?", "weight": 0.5 } ], "passThreshold": 1.0}Every check type
See verifier check types for the full list of checks and their parameters, and verifiers for how weighted-mean scoring works.
2. Create it
Register the spec in your workspace. The CLI reuses an existing verifier if the spec is unchanged.
rollout verifiers create refund.json# Created verifier refund · pass_threshold 1.03. Use it
Pass the verifier to an optimization run or a benchmark — anywhere a score is needed. GEPA uses the per-check feedback to reason about what to change next.
rollout optimize create \ --target triage-prompt \ --dataset ./support-triage \ --verifier ./refund.jsonFrom here, run GEPA as in the GEPA recipe.