Python SDK

Streaming

Streamed completions are recorded without storing every token. The SDK keeps a rolling preview and the final output and usage instead.

How it works

By default, per-chunk events are not persisted. As tokens arrive the SDK maintains a rolling preview and sends periodic updates, then records the final output and usage when the stream ends. This keeps a streamed call from flooding your workspace with one event per token while still showing you what was produced.

Manual streaming

Open the span with stream=True, feed each delta to record_chunk, and call span.end(...) after the stream to record the final output and usage.

streaming.py
with trace.llm("openai.responses", model="gpt-4.1-mini", stream=True) as span:    span.record_input({"messages": messages})    full_text = ""    for chunk in stream:        delta = chunk.choices[0].delta.content or ""        full_text += delta        span.record_chunk(delta)    span.end(output=full_text, usage={"input_tokens": 512, "output_tokens": 128})

Note

span.end(...) is safe to call once; the surrounding with block will not double-finalize it. Call it explicitly here so the final output and usage are attached before the span closes.

With a wrapped client

If you wrap the provider client, streaming is handled for you. Streaming calls yield the original chunks unchanged while the wrapper records the rolling preview and the final output and usage — no record_chunk loop required.

wrapped_stream.py
openai_client = rollout.wrap(OpenAI())stream = openai_client.chat.completions.create(    model="gpt-4.1-mini",    messages=messages,    stream=True,)for chunk in stream:      # original chunks, untouched    ...                   # the span records preview + final output automatically

Persisting every token

To store individual chunk events — for token-level inspection — set capture_stream_chunks=True on the client. It is off by default. The size of the rolling preview is controlled by stream_preview_chars (default 4096).

chunks.py
client = Rollout(    api_key="...",    capture_stream_chunks=True,   # persist every token (off by default)    stream_preview_chars=4096,    # rolling preview size)