Trace
A trace is one agent run — one conversation turn, one request, one task. It is the top-level container everything else hangs off, and it carries identity like a conversationId and a userId. You open a trace, do work inside it, and it flushes when the block or callback exits.
trace support_agent (conversation_id=thread_123, user_id=cus_123)├─ message user "Where is my order?"├─ span llm gpt-4.1-mini · 512→128 tok · 840ms│ ├─ tool.call lookup_order(id="4421")│ └─ tool.result { status: "shipped" }├─ message assistant "Your order has shipped."└─ feedback thumbs_up trueSpan
A span is one unit of work inside a trace, recorded with a type, an input, an output, latency, and any error. The common types are:
llm— a model call, with model, provider, token usage, and a streamed previewtool— a pairedtool.call/tool.result, linked by the model's tool-call idretrieval— a lookup or search steptask— a multi-step unit of work you define
Spans nest: a tool call made during an LLM turn sits under that turn.
Message
A message is a turn in the conversation — user, assistant, or system. Messages are the human-readable narrative of the trace; spans are the machine detail underneath. You usually record both: the message the user sent, and the llm span that produced the reply.
Feedback vs signal
These two look similar and mean different things:
- Feedback is explicit — a thumbs-up, a star rating, a correction. A human told you how the run went.
- Signal is implicit — a behavioral or business outcome like an order placed or a ticket reopened. The system observed what happened next.
Tip
Both attach to a trace, and both feed the same loop: they are how you later tell good runs from bad ones — which is exactly what a verifier formalizes when you optimize.
Everything is an event
Under the hood, every method — opening a trace, recording a span, attaching feedback — emits a typed event to the ingest API. The SDKs are an ergonomic layer over that event stream: batched, sampled, and scrubbed before they leave your process. If you only need to ship raw events, every method maps to a documented event.
Control plane vs compute plane
Optimization adds one more idea. The work splits in two, and Rollout never calls your models:
- Control plane (Rollout) — stores the run config, dataset, verifier, candidate prompts, scores, traces, and the promotion report.
- Compute plane (your runner) — imports your target, runs your real agent, calls your model provider, scores outputs, and uploads results back.
Your provider keys and agent code stay in your process. This is the same trace model from above — each evaluated example becomes a trace — viewed through the lens of a search. See Optimization overview for the full picture.