Streaming architecture¶
This document explains the design decision to use Server-Sent Events (SSE) for LLM response streaming.
Context¶
Canvas Chat streams LLM responses token-by-token to the frontend, allowing users to see responses as they're generated rather than waiting for the complete response.
Decision¶
We use SSE (Server-Sent Events) for streaming LLM responses from the backend to the frontend.
Alternatives considered¶
Non-streaming (simple POST request)¶
Wait for the complete LLM response, then display it all at once.
Advantages:
- Simplest implementation
- No parsing complexity
- Guaranteed correct content formatting
Disadvantages:
- Higher perceived latency (2-5 seconds of waiting before any content appears)
- Poor user experience for long responses
WebSockets¶
Bidirectional persistent connection.
Advantages:
- More reliable than SSE for some edge cases
- Better binary data support
Disadvantages:
- More complex to implement
- Overkill for unidirectional streaming
- Doesn't solve the core parsing challenges
NDJSON streaming¶
Each chunk is a complete JSON object on its own line.
Advantages:
- Self-describing format
- Clear content boundaries
Disadvantages:
- Similar complexity to SSE
- Less browser-native support
Why SSE¶
SSE provides the best balance of:
- Native browser support - No additional libraries needed
- Unidirectional simplicity - We only need server-to-client streaming
- Automatic reconnection - Built into the EventSource API
- Text-based - Natural fit for LLM token streams
Implementation notes¶
SSE uses CRLF (\r\n) line endings per HTTP specification. Our client normalizes these to LF (\n) before parsing to ensure consistent handling across platforms.
Multi-line content in SSE is sent as multiple data: lines within a single event. Per the SSE specification, the client joins these lines with newlines when reconstructing the content.