How it works
By default, per-chunk events are not persisted. As tokens arrive the SDK maintains a rolling preview and sends periodic updates, then records the final output and usage when the stream ends. This keeps a streamed call from flooding your workspace with one event per token while still showing you what was produced.
Manual streaming
Open the span with stream=True, feed each delta to record_chunk, and call span.end(...) after the stream to record the final output and usage.
with trace.llm("openai.responses", model="gpt-4.1-mini", stream=True) as span: span.record_input({"messages": messages}) full_text = "" for chunk in stream: delta = chunk.choices[0].delta.content or "" full_text += delta span.record_chunk(delta) span.end(output=full_text, usage={"input_tokens": 512, "output_tokens": 128})Note
span.end(...) is safe to call once; the surrounding with block will not double-finalize it. Call it explicitly here so the final output and usage are attached before the span closes.
With a wrapped client
If you wrap the provider client, streaming is handled for you. Streaming calls yield the original chunks unchanged while the wrapper records the rolling preview and the final output and usage — no record_chunk loop required.
openai_client = rollout.wrap(OpenAI())stream = openai_client.chat.completions.create( model="gpt-4.1-mini", messages=messages, stream=True,)for chunk in stream: # original chunks, untouched ... # the span records preview + final output automaticallyPersisting every token
To store individual chunk events — for token-level inspection — set capture_stream_chunks=True on the client. It is off by default. The size of the rolling preview is controlled by stream_preview_chars (default 4096).
client = Rollout( api_key="...", capture_stream_chunks=True, # persist every token (off by default) stream_preview_chars=4096, # rolling preview size)