Streaming LLM Output to React: A Practical Guide to Server-Sent Events
Users expect the typewriter effect. When someone hits “send” on a prompt, they want to watch the model think, token by token, not stare at a spinner while a backend assembles a full response. That experience has become table stakes for any AI feature. Streaming LLM output to React isn’t just a nice-to-have; it’s the difference between a product that feels alive and one that feels like a form submission from 2005.
The problem is that many frontend developers still reach for polling or WebSockets out of habit, when the job calls for something simpler. This guide will walk you through a complete implementation of streaming LLM output React people actually enjoy using, from a Node.js backend to a production-ready React component, with real code you can steal.
Why Server-Sent Events Beat the Alternatives
When you stream an LLM response, data flows in one direction: server to client. You don’t need the client to send anything after the initial request. That makes Server-Sent Events (SSE) a perfect fit. SSE is a lightweight, HTTP-based protocol that lets the server push multiple text events to the browser over a single long-lived connection. The browser handles it with the built-in `EventSource` API, which means no extra libraries and no heavy protocol negotiation.
Compare that to WebSockets. WebSockets give you a full-duplex connection, which is overkill if you just want to pipe tokens to a UI. You’ll write more code to manage the socket lifecycle, handle reconnections yourself, and deal with potential proxy or load-balancer issues. Polling, meanwhile, is the UX equivalent of waiting for a fax to arrive: it wastes bandwidth and adds latency you can’t afford when you’re chasing a sub-second time-to-first-token.
SSE gives you automatic reconnection out of the box. If the connection drops, `EventSource` will try to reconnect after a configurable interval. You get a clean event-based interface on the frontend, and on the backend it’s just a matter of setting the right headers and writing data to the response. The protocol is text-only, so you can inspect traffic with any browser’s network tab. For a streaming API React component that displays LLM tokens, SSE is the obvious choice.
Backend: Streaming from OpenAI (Node.js/Express)
Let’s build a backend endpoint that streams completions from OpenAI’s chat API. We’ll use Express and the OpenAI Node.js SDK, but the same pattern works with Anthropic’s streaming API or any library that returns an async iterable of chunks.
The Naive Approach (That You Shouldn’t Deploy)
A first attempt often looks like this: set the SSE headers, call the OpenAI stream, and pipe each chunk directly to the client.
import express from 'express';
import OpenAI from 'openai';
const app = express();
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
app.get('/stream', async (req, res) => {
// Headers required for SSE
res.setHeader('Content-Type', 'text/event-stream');
res.setHeader('Cache-Control', 'no-cache');
res.setHeader('Connection', 'keep-alive');
const stream = await openai.chat.completions.create({
model: 'gpt-4o-mini',
messages: [{ role: 'user', content: req.query.prompt || 'Hello' }],
stream: true,
});
// Write each chunk as an SSE event
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content || '';
if (content) {
res.write(`data: ${JSON.stringify({ token: content })}\n\n`);
}
}
res.write('data: [DONE]\n\n');
res.end();
});
This works on localhost, but it falls apart quickly in production. There’s no handling of client disconnects: if the user closes the tab, the OpenAI stream keeps running and burning credits. There’s no heartbeat, so middleware or proxies might close the connection because they think it’s idle. The error handling is absent — if the OpenAI call fails halfway through, you’ve already sent headers and can’t return a proper HTTP error status.
The Production Version
A hardened SSE endpoint needs to:
- Detect client disconnection and abort the upstream stream.
- Send periodic heartbeats to keep the connection alive.
- Handle errors gracefully by closing the stream with an error event.
- Optionally include a reconnection hint (the `retry` field) so the frontend knows how long to wait before retrying.
Here’s an Express route that does all that. It uses an AbortController to cancel the OpenAI request when the client drops.
app.get('/stream', async (req, res) => {
// SSE headers
res.setHeader('Content-Type', 'text/event-stream');
res.setHeader('Cache-Control', 'no-cache');
res.setHeader('Connection', 'keep-alive');
// Allow any origin for dev; lock this down in production
res.setHeader('Access-Control-Allow-Origin', '*');
// Tell the frontend how long to wait before reconnecting (3 seconds)
res.write('retry: 3000\n\n');
const prompt = req.query.prompt || 'Tell me a short joke.';
const controller = new AbortController();
let stream;
// Abort the upstream request if the client disconnects
req.on('close', () => {
console.log('Client disconnected, aborting upstream stream');
controller.abort();
});
// Heartbeat interval to prevent proxy timeouts (every 15 seconds)
const heartbeatInterval = setInterval(() => {
res.write(': heartbeat\n\n');
}, 15000);
try {
stream = await openai.chat.completions.create(
{
model: 'gpt-4o-mini',
messages: [{ role: 'user', content: prompt }],
stream: true,
},
{ signal: controller.signal }
);
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content || '';
if (content) {
// SSE lines must start with "data: " and end with two newlines
res.write(`data: ${JSON.stringify({ token: content })}\n\n`);
}
}
// Signal normal completion
res.write('data: [DONE]\n\n');
} catch (err) {
// If the stream aborted due to client disconnect, don't try to write
if (controller.signal.aborted) {
console.log('Upstream stream aborted because client disconnected.');
} else {
console.error('OpenAI stream error:', err);
// Send an error event the client can recognise
res.write(`event: error\ndata: ${JSON.stringify({ message: 'Stream failed. Please try again.' })}\n\n`);
}
} finally {
clearInterval(heartbeatInterval);
res.end();
}
});
Key details:
- The heartbeat comment line (`: heartbeat`) is ignored by `EventSource`, so it doesn’t mess with your token stream. It just keeps the TCP connection from looking idle.
- The `retry: 3000` line lets the browser’s `EventSource` object automatically reconnect after 3 seconds if the connection is lost.
- The `close` event on `req` fires when the HTTP connection closes for any reason (client navigates away, network blip, etc.). We use it to abort the OpenAI stream and avoid wasting compute.
- We send an error event with a custom `event: error` so the frontend can distinguish between a data token and a failure.
Now you have a server-sent events react endpoint that’s actually battle-tested.
---
Frontend: Consuming the Stream in React
There are two main ways to consume an SSE stream in a React app: the built-in `EventSource` API and a manual `fetch` with a readable stream. The naive approach uses `fetch` because it gives you more control over request headers and the HTTP method. The production approach uses `EventSource` because it’s simpler, auto-reconnects, and handles heartbeats for you.
The Naive Fetch-Based Component
This version manually parses the SSE text stream. It’s tempting when you need to POST a request body (you can’t send a POST with `EventSource`), but it puts reconnection logic and stream parsing on your shoulders.
import { useState, useEffect, useRef } from 'react';
function NaiveStreamingComponent() {
const [tokens, setTokens] = useState('');
const [isStreaming, setIsStreaming] = useState(false);
const abortControllerRef = useRef(null);
const startStream = async () => {
setIsStreaming(true);
setTokens('');
abortControllerRef.current = new AbortController();
try {
const response = await fetch('http://localhost:3001/stream?prompt=Tell me a joke', {
signal: abortControllerRef.current.signal,
});
if (!response.ok) throw new Error(`HTTP error ${response.status}`);
const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split('\n');
// Keep the last partial line in the buffer
buffer = lines.pop() || '';
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = line.slice(6);
if (data === '[DONE]') {
setIsStreaming(false);
return;
}
try {
const parsed = JSON.parse(data);
setTokens(prev => prev + parsed.token);
} catch {
// Ignore malformed JSON (could be a heartbeat comment)
}
}
}
}
} catch (err) {
if (err.name !== 'AbortError') {
console.error('Stream error:', err);
setTokens(prev => prev + '\n[Stream interrupted]');
}
} finally {
setIsStreaming(false);
}
};
const stopStream = () => {
abortControllerRef.current?.abort();
setIsStreaming(false);
};
useEffect(() => {
return () => abortControllerRef.current?.abort(); // cleanup on unmount
}, []);
return (
<div>
<button onClick={startStream} disabled={isStreaming}>Start</button>
<button onClick={stopStream} disabled={!isStreaming}>Stop</button>
<pre>{tokens}</pre>
</div>
);
}
This works, but it’s fragile. You wrote a mini SSE parser. If the server sends a `retry` field or an `event:` line, you’ll ignore it or break. Reconnection? That’s on you to implement. The code is already 60 lines and barely handles the happy path.
The Production Component Using EventSource
`EventSource` handles parsing, reconnection, and custom event types. Its main limitation is that it only supports GET requests and you can’t attach a request body or custom headers beyond what the browser sends with cookies. If your LLM API needs to accept a large prompt or conversation history, you have two clean options: put the prompt in a query parameter (watch out for URL length limits), or create a short-lived resource via a separate POST and stream from its GET endpoint. For this example, we’ll stick with a GET-based endpoint.
Here’s a React component built for real-time AI responses that doesn’t fall over.
import { useState, useEffect, useRef, useCallback } from 'react';
function LLMStreamingComponent() {
const [tokens, setTokens] = useState('');
const [status, setStatus] = useState('idle'); // idle | streaming | reconnecting | error
const eventSourceRef = useRef(null);
const startStream = useCallback(() => {
setTokens('');
setStatus('streaming');
const url = `http://localhost:3001/stream?prompt=${encodeURIComponent('Tell me a joke')}`;
const es = new EventSource(url);
eventSourceRef.current = es;
// Default event: `message` fires when no event type is specified
es.addEventListener('message', (event) => {
const data = event.data;
if (data === '[DONE]') {
es.close();
setStatus('idle');
return;
}
try {
const parsed = JSON.parse(data);
// Append the new token, using functional update to avoid stale state
setTokens(prev => prev + parsed.token);
} catch {
// Ignore unparseable data (heartbeats or comments are fine)
}
});
// Custom error event from the server
es.addEventListener('error', (event) => {
// Check event.data for a JSON error message
if (event.data) {
try {
const err = JSON.parse(event.data);
console.error('Server error:', err.message);
} catch {}
}
// EventSource will try to reconnect automatically unless we close it
es.close();
setStatus('error');
});
// Browser-level error (connection lost, network issue)
es.onerror = () => {
// readyState 2 means CLOSED (the connection won't retry anymore)
if (es.readyState === EventSource.CLOSED) {
setStatus('error');
} else {
// readyState 0 or 1, EventSource is attempting to reconnect
setStatus('reconnecting');
}
// We don't close here; let EventSource retry based on the server's `retry` field
};
}, []);
const stopStream = useCallback(() => {
eventSourceRef.current?.close();
setStatus('idle');
}, []);
// Cleanup on unmount or when component is removed
useEffect(() => {
return () => {
eventSourceRef.current?.close();
};
}, []);
return (
<div>
<button onClick={startStream} disabled={status === 'streaming' || status === 'reconnecting'}>
Start
</button>
<button onClick={stopStream} disabled={status !== 'streaming' && status !== 'reconnecting'}>
Stop
</button>
<div style={{ marginTop: '1em', whiteSpace: 'pre-wrap' }}>
{tokens}
</div>
{status === 'reconnecting' && <p>Connection lost. Reconnecting...</p>}
{status === 'error' && <p>Stream failed. <button onClick={startStream}>Retry</button></p>}
</div>
);
}/This component gives you:
- Auto-reconnection: When the server sends `retry: 3000`, the browser will wait 3 seconds after a disconnect and reconnect. During that time, we show a status message. If reconnection fails after several attempts (browser-dependent), `readyState` becomes `CLOSED` and we show an error UI with a manual retry button.
- Proper state management: We use functional updates in `setTokens` so we never capture a stale value when the stream is running fast.
- Controlled lifecycle: The `EventSource` is cleaned up when the component unmounts. The stop button explicitly closes the connection, preventing reconnect attempts.
- Status tracking: Instead of a binary flag, we have a state machine (`idle`, `streaming`, `reconnecting`, `error`) that drives the UI correctly.
This is how a streaming LLM output React component should behave in production.
Conclusion: Best Practices for Streaming LLM Output in React
Streaming LLM output React apps feels magical when you get the details right. Here’s what matters most:
- Always handle disconnections - both on the server (abort upstream streams) and on the client (show reconnection state, don’t leave stale connections open).
- Use `EventSource` in production. It’s fewer lines, fewer bugs, and the browser’s reconnection logic has been battle-tested. Fall back to a `fetch`-based stream only when you absolutely need custom headers or a POST body.
- Send heartbeats and a `retry` field from your SSE endpoint. Without them, some network middleboxes will kill your stream after 30–60 seconds of inactivity, and the client won’t know how fast to reconnect.
- Keep your UI state updates functional. When tokens arrive every few milliseconds, class component `setState` or stale closures can drop characters. Always use the callback form (`setTokens(prev => prev + newToken)`).
- Clean up in `useEffect` return. Unmounted components should not try to process SSE events.
Server-Sent Events React integration is rarely the bottleneck; it’s the edge cases that bite you. Build your streaming pipeline with the assumption that connections will drop, users will navigate away, and middleboxes will interfere. If you do that, real-time AI responses feel instant and trustworthy.
Insights, Tutorials, and AI Tips
Explore the newest tutorials and expert takes on large language model APIs, real-time chatbot performance, prompt engineering, and scalable AI usage.



