Rollout documentation

Check types

type	params	Outcome	Passes when
task_expectations	—	graded	Output satisfies the task's mustMention / mustNotMention expectations.
contains	`value`, `caseSensitive`?	binary	Output contains the substring. Alias: must_contain.
not_contains	`value`, `caseSensitive`?	binary	Output does not contain the substring. Alias: must_not_contain.
regex	`pattern`	binary	re.search finds the pattern in the output.
equals	`value`, `caseSensitive`?	binary	Trimmed output equals the trimmed value. Alias: exact_match.
min_length	`value`	binary	len(output) ≥ value.
max_length	`value`	binary	len(output) ≤ value.
json_valid	—	binary	Output parses with json.loads (strict).
json_keys	`requiredKeys`	binary	Output is a JSON object containing every required key.
expected_output_schema	—	binary	Output is a JSON object with the required keys from the task's expected output schema.

Matching for the substring and equality checks is case-insensitive unless caseSensitive: true is set in params. The value for contains / not_contains may be given as value or text; for json_keys the keys may be given as requiredKeys or keys.

Outcomes & scoring

Binary checks return 1.0 on pass, 0.0 on fail.
Graded checks (task_expectations) return a fraction in [0, 1] for partial credit.
A verifier's score is the weighted mean of its checks (default weight: 1). Across multiple verifiers, the final score is the plain mean.
passThreshold (default 1.0) is the score at or above which the output passes. A check marked "required": true makes any failure of it critical — the output cannot pass regardless of the numeric score.

Note

See Verifiers for the full spec, weighting strategy, and a worked example.

task_expectations

Reads task.metadata.expectations with mustMention and mustNotMention lists. Each entry uses anyOf (a list of acceptable phrases) or a single text phrase, plus a message used as the failure reason. The score is (total − failures) / total across all entries, so it gives partial credit and a concrete reason per missed expectation — exactly the feedback GEPA reflects on.

Skips & errors

An unknown check type is skipped — it does not fail the output, and the skip is noted in the feedback. A check that raises is treated as a failure with the exception type in the reason, so one broken check can never crash a run.