1. Install the CLI
The optimize extra pulls in the GEPA search engine. Authenticate once and the CLI remembers your workspace.
pip install "rollout-cli[optimize]"rollout login2. Expose your agent as a target
Wrap the entry point you want to improve with @rollout.optimize. GEPA passes in a candidate — the prompt it is currently testing — and a task from your dataset. Apply candidate.text and return what your agent produced; everything else stays fixed so the score reflects the prompt alone.
import mv37.rollout as rolloutfrom openai import OpenAIBASELINE = "You are a support agent. Read the message and reply."client = OpenAI()@rollout.optimize(id="triage-prompt", kind="prompt", baseline=BASELINE)def run_candidate(task: rollout.Task, candidate: rollout.Candidate) -> rollout.AgentResult: response = client.chat.completions.create( model="gpt-4.1-mini", messages=[ {"role": "system", "content": candidate.text}, {"role": "user", "content": task.instruction}, ], ) return rollout.AgentResult(output=response.choices[0].message.content or "")Pick a weak baseline
GEPA has the most room to work when the baseline is deliberately underspecified. A vague prompt on a cheap model is the ideal starting point.
3. Point it at a dataset and a verifier
Sync your targets to the workspace, then create a run from a dataset (a Harbor folder of tasks) and a verifier (a JSON spec that scores each output). See the verifier recipe and datasets for those two files.
rollout optimize sync-targetsrollout optimize create \ --target triage-prompt \ --dataset ./support-triage \ --verifier ./refund.json4. Run it
Kick off the search. GEPA proposes and scores candidate prompts against your dataset until it exhausts the metric-call budget; the live run page shows the candidate list and per-task feedback as it goes.
rollout optimize run triage-prompt# GEPA · scoring candidates ............ done (32s)# best 0.91 · baseline 0.79 · +12% on holdout5. Read the promotion report
Every run ends with a promotion report comparing the best candidate to your baseline on a held-out split. If the lift cleared your bar, the winning prompt is stored on the run — copy it straight into production. For the narrated, end-to-end version of this recipe, see the GEPA guide.