Python SDK

Traces & spans

A trace is one agent run. Inside it you record messages — the conversation — and spans — the units of work that produced it.

Opening a trace

Open a trace with client.trace(name, ...) as a context manager. The trace stays active for the duration of the block; spans, messages, feedback, and signals you record inside it attach automatically. When the block exits, the trace flushes.

trace.py
with client.trace(    "support_agent",    conversation_id="thread_123",    user_id="cus_123",) as trace:    trace.message(role="user", content="Where is my order?")    ...

The most useful arguments to trace():

ArgumentTypePurpose
namestrTrace name — usually the agent or workflow.
user_idstr | NoneAssociate the run with an end user.
session_idstr | NoneGroup several traces into one session.
conversation_idstr | NoneTie turns of a multi-turn conversation together.
external_trace_idstr | NoneCorrelate with an ID from your own system.
attributesdict | NoneArbitrary metadata attached to the trace.
contextdict | NoneExtra fields merged onto every event in the trace.

Messages

Record the conversation with trace.message(...). Content can be a plain string or a structured content list (for multimodal turns). Set is_internal=True for messages the user never sees, such as a scratchpad or a system reflection.

messages.py
trace.message(role="user", content="Where is my order?")trace.message(role="assistant", content="Your order has shipped.")# link an assistant tool message to the originating tool calltrace.message(role="tool", content=result_json, tool_call_id=tool_call.id)

Spans

A span is a typed unit of work inside a trace. Open one with trace.span(type, ...) as a context manager and record its input and output. The span captures its own latency and marks itself failed if an exception propagates out of the block.

spans.py
with trace.span("retrieval", name="search_docs") as span:    span.record_input({"query": "refund policy"})    docs = search("refund policy")    span.record_output({"hits": len(docs)})

Span types are free-form strings. The SDK and dashboard understand a few conventional ones, and you can use your own for custom steps:

Span typeMeaning
llmA model call. Use trace.llm(...) as a shorthand.
toolA tool invocation. Usually recorded via trace.tool(...).
taskA multi-step unit of work — see the @task decorator.
retrievalA retrieval / RAG step.
spanA generic span for anything else.

LLM spans & usage

trace.llm(name, ...) is a shorthand for an llm span. Pass the model and provider so they are recorded as structured metadata, then use record_input / record_output and set_usage. record_outputaccepts pydantic models directly, so you don't serialize the response yourself.

llm.py
from mv37.rollout import usage_from_openaiwith trace.llm("openai.responses", model="gpt-4.1-mini") as span:    span.record_input({"messages": messages})    response = openai_client.responses.create(model="gpt-4.1-mini", input=messages)    span.record_output(response)    span.set_usage(**usage_from_openai(response))

set_usage accepts the full set of token and cost fields — pass whichever your provider reports:

FieldMeaning
input_tokens / output_tokensPrompt and completion tokens.
cached_tokensPrompt tokens served from cache.
reasoning_tokensReasoning / thinking tokens, when reported.
total_tokensTotal tokens for the call.
cost_usdCost of the call in USD, if you compute it.
context_window_tokens / context_used_tokensContext window size and how much was used.

Tip

usage_from_openai(response) and usage_from_anthropic(response) read these counts straight off the provider response (including cached and reasoning tokens) and return a dict ready to splat into set_usage.

Nesting spans

Spans nest naturally — open a span inside another with block and it records its parent automatically. This is how a planning step that makes several model and tool calls shows up as a tree rather than a flat list.

nesting.py
with trace.span("task", name="resolve_ticket") as task:    task.record_input({"ticket_id": "T-42"})    with trace.llm("openai.responses", model="gpt-4.1-mini") as span:        ...  # this llm span is a child of the task span    with trace.tool("issue_refund", arguments={"order": "4421"}) as call:        call.record_output(run_refund("4421"))

Async apps

Every context manager has an async form. Use async with for traces and spans in async code; the rest of the API is identical.

async_agent.py
async with client.trace("support_agent") as trace:    async with trace.llm("openai.responses", model="gpt-4.1-mini") as span:        span.record_input({"messages": messages})        response = await openai_client.responses.create(...)        span.record_output(response)

Heads up

In long-running async services, also call await client.ashutdown()from your framework's shutdown hook so queued events are flushed. See Lifecycle & shutdown.