Reference

Verifier check types

The complete list of native verifier checks. Each runs locally and deterministically — no LLM judge. A check returns a pass/fail, or a fraction for partial credit; the verifier score is the weighted mean across its checks.

Check types

typeparamsOutcomePasses when
task_expectationsgradedOutput satisfies the task's mustMention / mustNotMention expectations.
containsvalue, caseSensitive?binaryOutput contains the substring. Alias: must_contain.
not_containsvalue, caseSensitive?binaryOutput does not contain the substring. Alias: must_not_contain.
regexpatternbinaryre.search finds the pattern in the output.
equalsvalue, caseSensitive?binaryTrimmed output equals the trimmed value. Alias: exact_match.
min_lengthvaluebinarylen(output) ≥ value.
max_lengthvaluebinarylen(output) ≤ value.
json_validbinaryOutput parses with json.loads (strict).
json_keysrequiredKeysbinaryOutput is a JSON object containing every required key.
expected_output_schemabinaryOutput is a JSON object with the required keys from the task's expected output schema.

Matching for the substring and equality checks is case-insensitive unless caseSensitive: true is set in params. The value for contains / not_contains may be given as value or text; for json_keys the keys may be given as requiredKeys or keys.

Outcomes & scoring

  • Binary checks return 1.0 on pass, 0.0 on fail.
  • Graded checks (task_expectations) return a fraction in [0, 1] for partial credit.
  • A verifier's score is the weighted mean of its checks (default weight: 1). Across multiple verifiers, the final score is the plain mean.
  • passThreshold (default 1.0) is the score at or above which the output passes. A check marked "required": true makes any failure of it critical — the output cannot pass regardless of the numeric score.

Note

See Verifiers for the full spec, weighting strategy, and a worked example.

task_expectations

Reads task.metadata.expectations with mustMention and mustNotMention lists. Each entry uses anyOf (a list of acceptable phrases) or a single text phrase, plus a message used as the failure reason. The score is (total − failures) / total across all entries, so it gives partial credit and a concrete reason per missed expectation — exactly the feedback GEPA reflects on.

Skips & errors

An unknown check type is skipped — it does not fail the output, and the skip is noted in the feedback. A check that raises is treated as a failure with the exception type in the reason, so one broken check can never crash a run.