Score outputs with a verifier

1. Write the spec

List your checks with a weight each and a passThreshold. Mix deterministic checks (a tool was called, the output is valid JSON, a field matches) with an LLM judge for the fuzzy stuff.

refund.json

{  "checks": [    { "type": "tool_called", "name": "issue_refund", "weight": 1.0 },    { "type": "json_valid", "weight": 1.0 },    { "type": "field_equals", "path": "refund.amount", "equals": "order.total", "weight": 1.0 },    { "type": "llm_judge", "prompt": "Is the tone polite and clear?", "weight": 0.5 }  ],  "passThreshold": 1.0}

Every check type

See verifier check types for the full list of checks and their parameters, and verifiers for how weighted-mean scoring works.

2. Create it

shell

rollout verifiers create refund.json# Created verifier refund · pass_threshold 1.0

3. Use it

Pass the verifier to an optimization run or a benchmark — anywhere a score is needed. GEPA uses the per-check feedback to reason about what to change next.

shell

rollout optimize create \  --target triage-prompt \  --dataset ./support-triage \  --verifier ./refund.json

From here, run GEPA as in the GEPA recipe.