Server-Sent Events (SSE)
Table of contents
ColdBrew exposes every server-streaming gRPC method as Server-Sent Events for free. Any rpc Foo(Req) returns (stream Resp) endpoint is SSE-consumable when the client sends Accept: text/event-stream — no per-service wiring, no proto changes, no custom HTTP handler. This is the path of least resistance for AI/LLM token streams, progress feeds, change notifications, and any other server → client push that benefits from staying on plain HTTP.
The marshaler is registered by default. There is nothing to import in your service code. Clients pick SSE by sending Accept: text/event-stream; everything else continues to receive newline-delimited JSON as before.
Browser EventSource(...) is the simplest consumer but only does GET requests, so it works directly only for streams mapped to HTTP GET. For POST-mapped streams (most non-trivial endpoints), use fetch with a streaming response reader, or a small library like microsoft/fetch-event-source — the wire format is identical, only the request side changes.
When to use SSE
| Use SSE for | Use something else for |
|---|---|
| AI/LLM token-by-token streaming | One-shot responses (unary RPC) |
| Server → client push (notifications, live counters, progress) | Bidirectional or high-frequency client → server messaging (WebSocket) |
| Browser clients you don’t want to ship a gRPC-web library for | Server-to-server streams (use native gRPC) |
Anything where a curl -N or EventSource(...) consumer is good enough | Binary streams (use the proto/protobuf gateway marshaler) |
If you need true bidi over HTTP, SSE is the wrong primitive — register a WebSocket handler via HTTP Gateway Extensions. For server-only push, prefer SSE: it reuses your existing gRPC stream + gateway plumbing instead of doubling the surface area.
Wire format
Each streamed gRPC message becomes one SSE frame:
data: {"result":{"token":"hello","index":0}}
data: {"result":{"token":"world","index":1}}
Two newlines (\n\n) terminate a frame, matching the SSE spec. The JSON payload uses protojson (same field naming and well-known-type handling as the gateway’s default application/json responses).
grpc-gateway wraps server-streaming responses in {"result": <message>} over HTTP — this is the documented gateway convention, not an SSE artifact. Native gRPC clients still see the unwrapped message. If you need full control over the wire bytes (no "result" wrapper, custom event:/id: fields), use google.api.HttpBody as the stream’s response type and marshal the SSE frame yourself in the handler.
Defining a streaming endpoint
A streaming method is just a stream response in your .proto. Nothing changes for SSE specifically. Map to HTTP GET if you want browser EventSource(...) to consume it directly; use POST (and a streaming fetch client) when the request needs a body:
rpc StreamTokens(StreamTokensRequest) returns (stream Token) {
option (google.api.http) = {
// GET keeps the EventSource example below working as-is.
// For POST, swap to: post: "/api/v1/stream/tokens" body: "*"
get: "/api/v1/stream/tokens"
};
}
message Token {
string text = 1;
int32 index = 2;
}
Implement using grpc.ServerStreamingServer[Token]:
func (s *svc) StreamTokens(req *proto.StreamTokensRequest, stream grpc.ServerStreamingServer[proto.Token]) error {
ctx := stream.Context()
start := time.Now()
for i, tok := range produce(ctx, req) {
// Stop generating (and stop paying for) tokens when the client disconnects.
// ctx.Err() goes non-nil when the HTTP connection drops — this is the
// load-bearing safety property for AI/LLM workloads.
if err := ctx.Err(); err != nil {
return errors.Wrap(err, "stream canceled")
}
if err := stream.Send(&proto.Token{Text: tok, Index: int32(i)}); err != nil {
return errors.Wrap(err, "stream send")
}
if i == 0 {
metrics.ObserveTTFT(time.Since(start))
}
}
return nil
}
Two things matter for production:
-
Check
stream.Context().Err()before everySend. A browser tab closing cancels the HTTP context, which cancels the gRPC stream context. Pass the same context into your LLM SDK call so cancellation propagates to the upstream provider — otherwise the model keeps generating (and billing) after the user is gone. -
Record time-to-first-token (TTFT) as a distinct metric. Total stream duration mixes upstream latency with generation throughput. Separating TTFT surfaces which one is degrading. Record it once per stream, on the first successful
Send.
Calling a streaming endpoint
Browser (EventSource)
const events = new EventSource("/api/v1/stream/tokens");
events.onmessage = (e) => {
const frame = JSON.parse(e.data);
console.log(frame.result.text); // unwrap the gateway envelope
};
events.onerror = () => events.close();
EventSource is the standard browser API — auto-reconnects on transient network failures, available in every modern browser, no dependencies. As noted in the intro, it only does GET; for POST-mapped streams use fetch(..., { method: "POST" }) with a streaming response reader, or a library like @microsoft/fetch-event-source that handles POST + SSE parsing.
curl
$ # Default — newline-delimited JSON:
$ curl -N 'http://localhost:9091/api/v1/stream/tokens?msg=hello+world'
$ # SSE — request text/event-stream:
$ curl -N -H 'Accept: text/event-stream' \
'http://localhost:9091/api/v1/stream/tokens?msg=hello+world'
-N (no-buffer) is required — without it, curl will hold the response until the stream completes.
For POST-mapped streams, the same flags apply; add -X POST -H 'Content-Type: application/json' -d '{...}' and replace the query string with a body. The wire format is identical.
Native gRPC
The same method is reachable as a native gRPC server-streaming call. SSE is purely a gateway concern; gRPC clients see plain proto-encoded Token messages with no {"result": ...} wrapping.
Disabling or replacing the default marshaler
Two opt-out paths:
| Goal | How |
|---|---|
| Turn off SSE entirely (force JSON for all streams) | Set DISABLE_SSE_MARSHALER=true
|
| Keep SSE but customize the framing | Register your own text/event-stream marshaler from PreStart — see HTTP Gateway Extensions. Service-registered marshalers win over ColdBrew’s defaults on the same MIME |
A custom marshaler is the right answer when you need richer SSE features (named events via event:, IDs for client-side dedup via id:, multi-line data: fields). Embed core.SSEMarshaler and override Marshal to add the extra fields.
Common pitfalls
-
Compression buffers SSE. gzip/zstd over an event stream stalls frame delivery because compressors hold bytes until they have enough to flush. ColdBrew’s HTTP compression wrapper automatically excludes
text/event-stream, so this Just Works — but if you put a reverse proxy in front (Nginx, Cloudflare, an in-house CDN) and it re-applies compression, you’ll see stalls. SendContent-Encoding: identityor configure the proxy to skip SSE. -
Reverse proxies also buffer responses. Nginx in particular holds chunks until ~4KB by default. Set
X-Accel-Buffering: noon the response, orproxy_buffering off;for the upstream block. Cloudflare typically passes SSE through but check your zone settings. -
EventSourcecan onlyGET. If your RPC’s HTTP annotation ispost, usefetchwith a streaming reader, or microsoft/fetch-event-source. The streaming format on the wire is identical. -
Mid-stream errors render in the trailers. gRPC stream errors arrive after the last
Send, encoded as a final SSE frame (or as HTTP trailers, depending on gateway config). For nicer client behavior — explicit error events the JS side can handle — useruntime.WithStreamErrorHandlerto control the format. -
The
{"result": ...}wrapper is gateway-imposed. It applies to every streaming RPC over HTTP, not just SSE. Either unwrap on the client (JSON.parse(e.data).result) or usegoogle.api.HttpBodyas the response type for full control.
Related
- Streaming RPCs — Proto definitions, handler patterns, deadline propagation, and the gateway’s behavior for every gRPC method shape.
- HTTP Gateway Extensions — Registering custom marshalers, error handlers, middleware, and additional routes on the gateway.
-
Configuration Reference —
DISABLE_SSE_MARSHALERand related HTTP gateway options. - Metrics — Where to surface TTFT and per-stream counters alongside ColdBrew’s default Prometheus metrics.