Rollout documentation

Harbor format

A Harbor dataset is a folder with a dataset.toml manifest and one subfolder per task containing a task.json:

layout.txt

datasets/refunds/  dataset.toml  refund-in-window/    task.json

dataset.toml

name = "Refund cases"slug = "refunds"[[tasks]]name = "refund-in-window"path = "refund-in-window"

The task file

Each task.json carries the instruction, the structured input your agent reads, and metadata. Expectations under metadata are what a task_expectations check grades against:

task.json

{  "instruction": "Customer asks for a refund on an order delivered yesterday.",  "input": { "order_id": "ord_123" },  "metadata": {    "expectations": [      { "message": "Mentions the refund is inside the policy window" }    ]  }}

These fields map directly onto the Task object your target receives — instruction, input, metadata, plus the name and split Rollout assigns.

Splits

GEPA needs a train split to search on and a holdout split to check the winner generalizes (a val split is optional). Pin task names to splits with a splits.json file passed via --splits-file:

splits.json

{  "train": ["refund-in-window", "refund-late", "..."],  "val":   ["..."],  "holdout": ["..."]}

Tip

Keep the holdout split genuinely held out — tasks the search never sees. The promotion decision is made on holdout scores, so leakage there inflates your results.

Convert a source

If your data lives in a JSONL, CSV, Hugging Face, or Parquet source, convert it into a Harbor folder with explicit column mapping. Source URIs:

URI	Source
hf:<repo>	Hugging Face dataset repo, e.g. hf:AI-MO/aimo-validation-aime
jsonl:<path>	Local JSONL file, one object per row
csv:<path>	Local CSV file with a header row
parquet:<path>	Local Parquet file

shell

rollout datasets pull hf:AI-MO/aimo-validation-aime \  --instruction problem \  --input answer=answer \  --metadata solution=solution \  --out ./datasets/aime

Mapping flags:

--instruction COLUMN maps the task instruction
--input key=column adds an input field (repeat for several)
--metadata key=column adds a metadata field (repeat for several)
--name COLUMN uses a column as the Harbor task name
--split NAME reads one named source split
--expected-output-schema JSON_OR_PATH stores a JSON object schema on each task
--limit N converts only the first N rows; --force replaces an existing output folder

Note

The base CLI reads JSONL and CSV. Hugging Face and Parquet need the optional dataset extras: install rollout-cli[datasets]. HF imports the datasets package and Parquet imports pyarrow only when actually used.

Inspect before converting

inspect prints detected columns, available splits, a row count when cheap, and a few sample rows — useful for getting the mapping flags right the first time:

shell

rollout datasets inspect hf:AI-MO/aimo-validation-aimerollout datasets inspect csv:./evals.csv --sample 5

Import and upload

You can upload during conversion with --upload, import an existing Harbor folder separately, or just point optimize create/run at the folder and let it import inline:

shell

# upload while convertingrollout datasets pull hf:AI-MO/aimo-validation-aime \  --instruction problem --input answer=answer --out ./datasets/aime --upload# or import a Harbor folder laterrollout datasets import ./datasets/aime# or just reference the folder when creating the run (imported inline)rollout optimize create \  --target math-agent \  --dataset ./datasets/aime \  --verifier ./verifiers/math.json \  --reflection-model openai/gpt-4.1-mini

Once a dataset is in Rollout you can reference it by slug (--dataset refunds) or by an exact version (--dataset-version-id dsv_...) instead of a path. See Verifiers next, then the end-to-end guide.