Harbor format
A Harbor dataset is a folder with a dataset.toml manifest and one subfolder per task containing a task.json:
datasets/refunds/ dataset.toml refund-in-window/ task.jsonname = "Refund cases"slug = "refunds"[[tasks]]name = "refund-in-window"path = "refund-in-window"The task file
Each task.json carries the instruction, the structured input your agent reads, and metadata. Expectations under metadata are what a task_expectations check grades against:
{ "instruction": "Customer asks for a refund on an order delivered yesterday.", "input": { "order_id": "ord_123" }, "metadata": { "expectations": [ { "message": "Mentions the refund is inside the policy window" } ] }}These fields map directly onto the Task object your target receives — instruction, input, metadata, plus the name and split Rollout assigns.
Splits
GEPA needs a train split to search on and a holdout split to check the winner generalizes (a val split is optional). Pin task names to splits with a splits.json file passed via --splits-file:
{ "train": ["refund-in-window", "refund-late", "..."], "val": ["..."], "holdout": ["..."]}Tip
Keep the holdout split genuinely held out — tasks the search never sees. The promotion decision is made on holdout scores, so leakage there inflates your results.
Convert a source
If your data lives in a JSONL, CSV, Hugging Face, or Parquet source, convert it into a Harbor folder with explicit column mapping. Source URIs:
| URI | Source |
|---|---|
| hf:<repo> | Hugging Face dataset repo, e.g. hf:AI-MO/aimo-validation-aime |
| jsonl:<path> | Local JSONL file, one object per row |
| csv:<path> | Local CSV file with a header row |
| parquet:<path> | Local Parquet file |
rollout datasets pull hf:AI-MO/aimo-validation-aime \ --instruction problem \ --input answer=answer \ --metadata solution=solution \ --out ./datasets/aimeMapping flags:
--instruction COLUMNmaps the task instruction--input key=columnadds an input field (repeat for several)--metadata key=columnadds a metadata field (repeat for several)--name COLUMNuses a column as the Harbor task name--split NAMEreads one named source split--expected-output-schema JSON_OR_PATHstores a JSON object schema on each task--limit Nconverts only the first N rows;--forcereplaces an existing output folder
Note
The base CLI reads JSONL and CSV. Hugging Face and Parquet need the optional dataset extras: install rollout-cli[datasets]. HF imports the datasets package and Parquet imports pyarrow only when actually used.
Inspect before converting
inspect prints detected columns, available splits, a row count when cheap, and a few sample rows — useful for getting the mapping flags right the first time:
rollout datasets inspect hf:AI-MO/aimo-validation-aimerollout datasets inspect csv:./evals.csv --sample 5Import and upload
You can upload during conversion with --upload, import an existing Harbor folder separately, or just point optimize create/run at the folder and let it import inline:
# upload while convertingrollout datasets pull hf:AI-MO/aimo-validation-aime \ --instruction problem --input answer=answer --out ./datasets/aime --upload# or import a Harbor folder laterrollout datasets import ./datasets/aime# or just reference the folder when creating the run (imported inline)rollout optimize create \ --target math-agent \ --dataset ./datasets/aime \ --verifier ./verifiers/math.json \ --reflection-model openai/gpt-4.1-miniOnce a dataset is in Rollout you can reference it by slug (--dataset refunds) or by an exact version (--dataset-version-id dsv_...) instead of a path. See Verifiers next, then the end-to-end guide.