Rollout documentation

Trace

A trace is one agent run — one conversation turn, one request, one task. It is the top-level container everything else hangs off, and it carries identity like a conversationId and a userId. You open a trace, do work inside it, and it flushes when the block or callback exits.

trace.txt

trace  support_agent  (conversation_id=thread_123, user_id=cus_123)├─ message   user       "Where is my order?"├─ span      llm        gpt-4.1-mini · 512→128 tok · 840ms│   ├─ tool.call    lookup_order(id="4421")│   └─ tool.result  { status: "shipped" }├─ message   assistant  "Your order has shipped."└─ feedback  thumbs_up  true

Span

A span is one unit of work inside a trace, recorded with a type, an input, an output, latency, and any error. The common types are:

llm — a model call, with model, provider, token usage, and a streamed preview
tool — a paired tool.call / tool.result, linked by the model's tool-call id
retrieval — a lookup or search step
task — a multi-step unit of work you define

Spans nest: a tool call made during an LLM turn sits under that turn.

Message

A message is a turn in the conversation — user, assistant, or system. Messages are the human-readable narrative of the trace; spans are the machine detail underneath. You usually record both: the message the user sent, and the llm span that produced the reply.

Feedback vs signal

These two look similar and mean different things:

Feedback is explicit — a thumbs-up, a star rating, a correction. A human told you how the run went.
Signal is implicit — a behavioral or business outcome like an order placed or a ticket reopened. The system observed what happened next.

Tip

Both attach to a trace, and both feed the same loop: they are how you later tell good runs from bad ones — which is exactly what a verifier formalizes when you optimize.

Everything is an event

Under the hood, every method — opening a trace, recording a span, attaching feedback — emits a typed event to the ingest API. The SDKs are an ergonomic layer over that event stream: batched, sampled, and scrubbed before they leave your process. If you only need to ship raw events, every method maps to a documented event.

Control plane vs compute plane

Optimization adds one more idea. The work splits in two, and Rollout never calls your models:

Control plane (Rollout) — stores the run config, dataset, verifier, candidate prompts, scores, traces, and the promotion report.
Compute plane (your runner) — imports your target, runs your real agent, calls your model provider, scores outputs, and uploads results back.

Your provider keys and agent code stay in your process. This is the same trace model from above — each evaluated example becomes a trace — viewed through the lens of a search. See Optimization overview for the full picture.

Where to go next

Quickstart — PythonInstall, point at your workspace, and ship your first trace.Quickstart — TypeScriptThe same first trace in a Node or edge app.Traces & spansThe Python guide to opening traces and nesting spans.Optimization (GEPA)Turn feedback into a verifier and let GEPA improve the prompt.