Optimization (GEPA)

Datasets

A dataset is the set of tasks GEPA scores your agent against. Rollout stores them in Harbor format — a folder of task files. You can write that folder by hand, generate it, or convert an external source with the CLI.

Harbor format

A Harbor dataset is a folder with a dataset.toml manifest and one subfolder per task containing a task.json:

layout.txt
datasets/refunds/  dataset.toml  refund-in-window/    task.json
dataset.toml
name = "Refund cases"slug = "refunds"[[tasks]]name = "refund-in-window"path = "refund-in-window"

The task file

Each task.json carries the instruction, the structured input your agent reads, and metadata. Expectations under metadata are what a task_expectations check grades against:

task.json
{  "instruction": "Customer asks for a refund on an order delivered yesterday.",  "input": { "order_id": "ord_123" },  "metadata": {    "expectations": [      { "message": "Mentions the refund is inside the policy window" }    ]  }}

These fields map directly onto the Task object your target receives — instruction, input, metadata, plus the name and split Rollout assigns.

Splits

GEPA needs a train split to search on and a holdout split to check the winner generalizes (a val split is optional). Pin task names to splits with a splits.json file passed via --splits-file:

splits.json
{  "train": ["refund-in-window", "refund-late", "..."],  "val":   ["..."],  "holdout": ["..."]}

Tip

Keep the holdout split genuinely held out — tasks the search never sees. The promotion decision is made on holdout scores, so leakage there inflates your results.

Convert a source

If your data lives in a JSONL, CSV, Hugging Face, or Parquet source, convert it into a Harbor folder with explicit column mapping. Source URIs:

URISource
hf:<repo>Hugging Face dataset repo, e.g. hf:AI-MO/aimo-validation-aime
jsonl:<path>Local JSONL file, one object per row
csv:<path>Local CSV file with a header row
parquet:<path>Local Parquet file
shell
rollout datasets pull hf:AI-MO/aimo-validation-aime \  --instruction problem \  --input answer=answer \  --metadata solution=solution \  --out ./datasets/aime

Mapping flags:

  • --instruction COLUMN maps the task instruction
  • --input key=column adds an input field (repeat for several)
  • --metadata key=column adds a metadata field (repeat for several)
  • --name COLUMN uses a column as the Harbor task name
  • --split NAME reads one named source split
  • --expected-output-schema JSON_OR_PATH stores a JSON object schema on each task
  • --limit N converts only the first N rows; --force replaces an existing output folder

Note

The base CLI reads JSONL and CSV. Hugging Face and Parquet need the optional dataset extras: install rollout-cli[datasets]. HF imports the datasets package and Parquet imports pyarrow only when actually used.

Inspect before converting

inspect prints detected columns, available splits, a row count when cheap, and a few sample rows — useful for getting the mapping flags right the first time:

shell
rollout datasets inspect hf:AI-MO/aimo-validation-aimerollout datasets inspect csv:./evals.csv --sample 5

Import and upload

You can upload during conversion with --upload, import an existing Harbor folder separately, or just point optimize create/run at the folder and let it import inline:

shell
# upload while convertingrollout datasets pull hf:AI-MO/aimo-validation-aime \  --instruction problem --input answer=answer --out ./datasets/aime --upload# or import a Harbor folder laterrollout datasets import ./datasets/aime# or just reference the folder when creating the run (imported inline)rollout optimize create \  --target math-agent \  --dataset ./datasets/aime \  --verifier ./verifiers/math.json \  --reflection-model openai/gpt-4.1-mini

Once a dataset is in Rollout you can reference it by slug (--dataset refunds) or by an exact version (--dataset-version-id dsv_...) instead of a path. See Verifiers next, then the end-to-end guide.