Streaming architecture¶

This document explains the design decision to use Server-Sent Events (SSE) for LLM response streaming.

Context¶

Canvas Chat streams LLM responses token-by-token to the frontend, allowing users to see responses as they're generated rather than waiting for the complete response.

Decision¶

We use SSE (Server-Sent Events) for streaming LLM responses from the backend to the frontend.

Alternatives considered¶

Non-streaming (simple POST request)¶

Wait for the complete LLM response, then display it all at once.

Advantages:

Simplest implementation
No parsing complexity
Guaranteed correct content formatting

Disadvantages:

Higher perceived latency (2-5 seconds of waiting before any content appears)
Poor user experience for long responses

WebSockets¶

Bidirectional persistent connection.

Advantages:

More reliable than SSE for some edge cases
Better binary data support

Disadvantages:

More complex to implement
Overkill for unidirectional streaming
Doesn't solve the core parsing challenges

NDJSON streaming¶

Each chunk is a complete JSON object on its own line.

Advantages:

Self-describing format
Clear content boundaries

Disadvantages:

Similar complexity to SSE
Less browser-native support

Why SSE¶

SSE provides the best balance of:

Native browser support - No additional libraries needed
Unidirectional simplicity - We only need server-to-client streaming
Automatic reconnection - Built into the EventSource API
Text-based - Natural fit for LLM token streams

Implementation notes¶

SSE uses CRLF (\r\n) line endings per HTTP specification. Our client normalizes these to LF (\n) before parsing to ensure consistent handling across platforms.

Multi-line content in SSE is sent as multiple data: lines within a single event. Per the SSE specification, the client joins these lines with newlines when reconstructing the content.