Streaming LLM Output to React: A Practical Guide to Server-Sent Events

If you're building a product powered by large language models (LLMs), you already know this challenge: your app sends a prompt to the backend, and nothing happens for two… maybe three… seconds. Then, the full response appears in one go. For the user, that delay feels like your app is frozen.

That’s not how modern AI experiences work.

Users expect the first token almost instantly like watching someone type. It’s not just smoother UX. It’s the standard. Apps like ChatGPT and Claude 3 raised the bar with real-time token streaming.

Thankfully, there’s a simple way to achieve this in React: Server-Sent Events (SSE). This article shows you how to stream tokens from your backend to your React frontend in real time without reaching for WebSockets or over-engineered infra.

Why Server-Sent Events (SSE)?

SSE is a lightweight, native browser protocol that allows the server to push data over an HTTP connection. It’s ideal for scenarios like LLM token streaming, where the client only needs to listen and display updates as they arrive.

Benefits of SSE:

No WebSocket overhead
Simple to implement and scale
Built into all modern browsers
Easy to integrate with any LLM backend that supports streaming (like OpenAI, Anthropic, or Mistral)

While alternatives like WebSockets or gRPC offer two-way communication, most frontend use cases, like streaming a chat reply only need one-way communication.

The Streaming Lifecycle

Here’s what streaming looks like under the hood:

The frontend sends a prompt to the backend.
The backend connects to the LLM and starts streaming the output tokens.
Each new token is sent immediately over an open SSE connection.
The frontend listens for those events and updates the UI in real time.

This approach gives users immediate feedback and lowers perceived latency.

Code Example: Backend with Express + SSE

Let’s say your backend uses Node.js and Express. Here’s a simplified server that streams tokens (you can replace this with your LLM provider’s stream):

// server.js

Code Block

app.get('/api/stream', async (req, res) => {
  res.set({
    'Content-Type': 'text/event-stream',
    'Cache-Control': 'no-cache',
    Connection: 'keep-alive',
  });

  const prompt = req.query.prompt;
  const fakeTokens = ['Hello', ',', ' world', '!'];

  for (const token of fakeTokens) {
    res.write(`data: ${token}\n\n`);
    await new Promise(r => setTimeout(r, 300)); // simulate streaming
  }

  res.end();
});

For real use cases, replace fakeTokens with actual streamed output from a model provider like OpenAI, Mistral, or AnyAPI.

‍

Frontend: React with EventSource

Now, let’s wire up the React side using the native EventSource API:

React Code Block

import { useEffect, useState } from 'react';

function TokenStream({ prompt }) {
  const [text, setText] = useState('');

  useEffect(() => {
    const eventSource = new EventSource(`/api/stream?prompt=${encodeURIComponent(prompt)}`);
    
    eventSource.onmessage = (e) => {
      setText(prev => prev + e.data);
    };
    
    eventSource.onerror = () => {
      eventSource.close();
    };
    
    return () => eventSource.close();
  }, [prompt]);

  return <div className="streaming-output">{text}</div>;
}

‍

This minimal React component gives you a smooth typewriter-like output stream. No polling, no page reloads.

‍

What About Production-Scale Apps?

In production, you may want to:

Add authentication (e.g. JWT verification) before starting SSE
Buffer and throttle tokens to reduce frontend re-renders
Show a typing cursor (▌) while tokens stream
Handle disconnects gracefully (SSE supports automatic reconnects)

Also consider fallback behavior for older browsers or mobile devices where SSE may be limited. For full cross-browser support, you could polyfill SSE or build a hybrid system.

What Providers Support Streaming?

Most modern LLM APIs support token-level streaming, but their implementations vary.

OpenAI: Uses stream: true flag. Returns tokenized chunks over HTTP.
Anthropic (Claude): Streams using a text/stream header.
Mistral, Cohere, and Groq: All support streaming with slight variations.
AnyAPI: Unifies multiple providers under one API with standardized streaming.

‍

Why This Matters in 2025

LLMs are faster than ever, but expectations are even faster.

When your app responds like it’s thinking out loud, users trust it more. And when you can deliver that with a few lines of JavaScript and an open HTTP connection, there's no reason not to.

Streaming also lets you:

Preemptively cancel long outputs (saving costs)
Display “typing” animations or inline loading states
Handle hallucination or errors earlier in the pipeline

In the age of real-time AI, response streaming is no longer optional UX polish, it’s part of the product.

‍

Stream Smarter with AnyAPI

Integrating SSE into your React frontend is one of the easiest, highest-impact improvements you can make to your AI product’s user experience. And if you're using multiple LLM providers, juggling streaming implementations becomes painful fast.

AnyAPI offers a unified interface for streaming across providers – OpenAI, Claude, Gemini, Mistral, and more – so your frontend code stays simple no matter what’s powering it.

‍

Streaming LLM Output to React: A Practical Guide to Server-Sent Events

Why Server-Sent Events (SSE)?

Benefits of SSE:

The Streaming Lifecycle

Code Example: Backend with Express + SSE

Frontend: React with EventSource

What About Production-Scale Apps?

What Providers Support Streaming?

Why This Matters in 2025

Stream Smarter with AnyAPI

Insights, Tutorials, and AI Tips

Streaming LLM Output to React: A Practical Guide to Server-Sent Events

A Developer’s Guide to the Top LLMs in 2025

From Prompts to Power: Why Tool-Augmented Agents Are the Future of AI Workflows

Ready to Build with the Best Models? Join the Waitlist to Test Them First