WebRTC signaling architecture¶
Canvas Chat uses a WebRTC-based peer-to-peer sync system built on top of Yjs CRDTs. This document explains how the signaling server works and why we chose this approach.
The signaling problem¶
WebRTC enables direct peer-to-peer connections between browsers, but peers need to discover each other first. This is the "signaling" problem - how do two browsers find each other on the internet?
The solution is a lightweight relay server that:
- Accepts WebSocket connections from peers
- Groups peers by "room" (session ID in our case)
- Relays connection metadata (SDP offers/answers, ICE candidates)
- Never sees or stores the actual sync data
Once peers exchange connection metadata, they establish a direct WebRTC connection and sync data peer-to-peer without the server.
Why build our own signaling server?¶
The y-webrtc package includes a Node.js signaling server, but we chose to implement a compatible server in FastAPI/Python for several reasons:
-
Single deployment: We already deploy a FastAPI app to Modal. Adding a separate Node.js service would complicate deployment and monitoring.
-
Protocol simplicity: The y-webrtc signaling protocol is just four message types (
subscribe,unsubscribe,publish,ping). Implementing it in Python is straightforward. -
Consistent stack: Using Python/FastAPI for the entire backend makes the codebase easier to maintain.
Protocol details¶
The signaling protocol uses JSON messages over WebSocket:
subscribe { "type": "subscribe", "topics": ["room-id-1", "room-id-2"] }
unsubscribe { "type": "unsubscribe", "topics": ["room-id-1"] }
publish { "type": "publish", "topic": "room-id", ...payload... }
ping { "type": "ping" } -> { "type": "pong" }
When a peer sends a publish message, the server broadcasts it to all other peers
subscribed to that topic. The server adds a clients field indicating how many
peers received the message.
Statelessness¶
The signaling server is completely stateless:
- No database: All state lives in memory
- No user data: Only relays opaque connection metadata
- Restart-safe: Peers automatically reconnect and re-subscribe
- Horizontally scalable: Each server instance is independent
If the server restarts, peers will reconnect within seconds. The CRDTs ensure eventual consistency even if some sync messages are lost during reconnection.
Privacy guarantees¶
The signaling server provides strong privacy guarantees:
-
Encrypted signaling: y-webrtc supports optional password-based encryption for signaling messages, preventing man-in-the-middle attacks.
-
No content visibility: The server only sees room IDs (random UUIDs) and encrypted connection metadata. It never sees node content, chat messages, or any user data.
-
Self-hosting option: Users can run their own signaling server for maximum privacy.
Architecture diagram¶
┌─────────────────────────────────────────────────────────────────────────┐
│ Canvas Chat Server │
│ │
│ ┌─────────────────┐ ┌─────────────────────────┐ │
│ │ FastAPI App │ │ Signaling Manager │ │
│ │ (HTTP/REST) │ │ (WebSocket) │ │
│ │ │ │ │ │
│ │ /api/chat │ │ /signal │ │
│ │ /api/models │ │ ├── topics: Map │ │
│ │ /api/... │ │ └── subscriptions: Map │ │
│ └─────────────────┘ └─────────────────────────┘ │
│ │ │
└────────────────────────────────────────────────────│─────────────────────┘
│
┌────────────────────────────────────────────┼────────────────────┐
│ │ │
▼ ▼ ▼
┌─────────┐ ┌─────────────────────────────┐
│ Browser │◄─────── WebRTC P2P ──────────►│ Browser │
│ (Peer A)│ │ (Peer B) │
│ │ │ │
│ ┌─────────────┐ │ ┌─────────────┐ │
│ │ CRDTGraph │ │ │ CRDTGraph │ │
│ │ + WebRTC │ │ │ + WebRTC │ │
│ │ + IndexedDB │ │ │ + IndexedDB │ │
│ └─────────────┘ │ └─────────────┘ │
└─────────┘ └─────────────────────────────┘
Connection flow¶
- User A opens a session, CRDTGraph creates a WebrtcProvider
- Provider connects to
/signaland subscribes to the session's room ID - User B opens the same session (shared link or same browser tab)
- Provider connects and subscribes to the same room ID
- Signaling server relays SDP offer from A to B
- B responds with SDP answer, relayed back to A
- Peers exchange ICE candidates through signaling
- Direct WebRTC connection established
- Yjs syncs CRDT state over WebRTC
- Both browsers now see the same canvas in real-time
Failure handling¶
The system handles various failure modes gracefully:
| Failure | Recovery |
|---|---|
| Signaling server down | Peers retry connection automatically |
| WebRTC connection lost | Yjs awareness triggers reconnection |
| NAT traversal fails | Falls back to TURN relay (future) |
| Browser tab closed | Other peers continue, state persists locally |
Future enhancements¶
-
TURN server support: For peers behind restrictive NATs, we may add TURN relay support for guaranteed connectivity.
-
Presence awareness: Show which users are viewing/editing the canvas using Yjs awareness protocol.
-
Selective sync: Only sync visible portions of large canvases.