Getting started

Core concepts

Both SDKs record the same shapes, and optimization builds on the same split. Learn these five nouns once and the Python docs, the TypeScript docs, and the CLI all read the same way.

Trace

A trace is one agent run — one conversation turn, one request, one task. It is the top-level container everything else hangs off, and it carries identity like a conversationId and a userId. You open a trace, do work inside it, and it flushes when the block or callback exits.

trace.txt
trace  support_agent  (conversation_id=thread_123, user_id=cus_123)├─ message   user       "Where is my order?"├─ span      llm        gpt-4.1-mini · 512→128 tok · 840ms│   ├─ tool.call    lookup_order(id="4421")│   └─ tool.result  { status: "shipped" }├─ message   assistant  "Your order has shipped."└─ feedback  thumbs_up  true

Span

A span is one unit of work inside a trace, recorded with a type, an input, an output, latency, and any error. The common types are:

  • llm — a model call, with model, provider, token usage, and a streamed preview
  • tool — a paired tool.call / tool.result, linked by the model's tool-call id
  • retrieval — a lookup or search step
  • task — a multi-step unit of work you define

Spans nest: a tool call made during an LLM turn sits under that turn.

Message

A message is a turn in the conversation — user, assistant, or system. Messages are the human-readable narrative of the trace; spans are the machine detail underneath. You usually record both: the message the user sent, and the llm span that produced the reply.

Feedback vs signal

These two look similar and mean different things:

  • Feedback is explicit — a thumbs-up, a star rating, a correction. A human told you how the run went.
  • Signal is implicit — a behavioral or business outcome like an order placed or a ticket reopened. The system observed what happened next.

Tip

Both attach to a trace, and both feed the same loop: they are how you later tell good runs from bad ones — which is exactly what a verifier formalizes when you optimize.

Everything is an event

Under the hood, every method — opening a trace, recording a span, attaching feedback — emits a typed event to the ingest API. The SDKs are an ergonomic layer over that event stream: batched, sampled, and scrubbed before they leave your process. If you only need to ship raw events, every method maps to a documented event.

Control plane vs compute plane

Optimization adds one more idea. The work splits in two, and Rollout never calls your models:

  • Control plane (Rollout) — stores the run config, dataset, verifier, candidate prompts, scores, traces, and the promotion report.
  • Compute plane (your runner) — imports your target, runs your real agent, calls your model provider, scores outputs, and uploads results back.

Your provider keys and agent code stay in your process. This is the same trace model from above — each evaluated example becomes a trace — viewed through the lens of a search. See Optimization overview for the full picture.

Where to go next