Streaming LLM Output to React: A Practical Guide to Server-Sent Events

Pattern

If you're building a product powered by large language models (LLMs), you already know this challenge: your app sends a prompt to the backend, and nothing happens for two… maybe three… seconds. Then, the full response appears in one go. For the user, that delay feels like your app is frozen.

That’s not how modern AI experiences work.

Users expect the first token almost instantly like watching someone type. It’s not just smoother UX. It’s the standard. Apps like ChatGPT and Claude 3 raised the bar with real-time token streaming.

Thankfully, there’s a simple way to achieve this in React: Server-Sent Events (SSE). This article shows you how to stream tokens from your backend to your React frontend in real time without reaching for WebSockets or over-engineered infra.

Why Server-Sent Events (SSE)?

SSE is a lightweight, native browser protocol that allows the server to push data over an HTTP connection. It’s ideal for scenarios like LLM token streaming, where the client only needs to listen and display updates as they arrive.

Benefits of SSE:

  • No WebSocket overhead
  • Simple to implement and scale
  • Built into all modern browsers
  • Easy to integrate with any LLM backend that supports streaming (like OpenAI, Anthropic, or Mistral)

While alternatives like WebSockets or gRPC offer two-way communication, most frontend use cases, like streaming a chat reply only need one-way communication.

The Streaming Lifecycle

Here’s what streaming looks like under the hood:

  1. The frontend sends a prompt to the backend.
  2. The backend connects to the LLM and starts streaming the output tokens.
  3. Each new token is sent immediately over an open SSE connection.
  4. The frontend listens for those events and updates the UI in real time.

This approach gives users immediate feedback and lowers perceived latency.

Code Example: Backend with Express + SSE

Let’s say your backend uses Node.js and Express. Here’s a simplified server that streams tokens (you can replace this with your LLM provider’s stream):

// server.js

Code Block
app.get('/api/stream', async (req, res) => {
  res.set({
    'Content-Type': 'text/event-stream',
    'Cache-Control': 'no-cache',
    Connection: 'keep-alive',
  });

  const prompt = req.query.prompt;
  const fakeTokens = ['Hello', ',', ' world', '!'];

  for (const token of fakeTokens) {
    res.write(`data: ${token}\n\n`);
    await new Promise(r => setTimeout(r, 300)); // simulate streaming
  }

  res.end();
});

For real use cases, replace fakeTokens with actual streamed output from a model provider like OpenAI, Mistral, or AnyAPI.

Frontend: React with EventSource

Now, let’s wire up the React side using the native EventSource API:

React Code Block
import { useEffect, useState } from 'react';

function TokenStream({ prompt }) {
  const [text, setText] = useState('');

  useEffect(() => {
    const eventSource = new EventSource(`/api/stream?prompt=${encodeURIComponent(prompt)}`);
    
    eventSource.onmessage = (e) => {
      setText(prev => prev + e.data);
    };
    
    eventSource.onerror = () => {
      eventSource.close();
    };
    
    return () => eventSource.close();
  }, [prompt]);

  return <div className="streaming-output">{text}</div>;
}

This minimal React component gives you a smooth typewriter-like output stream. No polling, no page reloads.

What About Production-Scale Apps?

In production, you may want to:

  • Add authentication (e.g. JWT verification) before starting SSE
  • Buffer and throttle tokens to reduce frontend re-renders
  • Show a typing cursor (▌) while tokens stream
  • Handle disconnects gracefully (SSE supports automatic reconnects)

Also consider fallback behavior for older browsers or mobile devices where SSE may be limited. For full cross-browser support, you could polyfill SSE or build a hybrid system.

What Providers Support Streaming?

Most modern LLM APIs support token-level streaming, but their implementations vary.

  • OpenAI: Uses stream: true flag. Returns tokenized chunks over HTTP.
  • Anthropic (Claude): Streams using a text/stream header.
  • Mistral, Cohere, and Groq: All support streaming with slight variations.
  • AnyAPI: Unifies multiple providers under one API with standardized streaming.

Why This Matters in 2025

LLMs are faster than ever, but expectations are even faster.

When your app responds like it’s thinking out loud, users trust it more. And when you can deliver that with a few lines of JavaScript and an open HTTP connection, there's no reason not to.

Streaming also lets you:

  • Preemptively cancel long outputs (saving costs)
  • Display “typing” animations or inline loading states
  • Handle hallucination or errors earlier in the pipeline

In the age of real-time AI, response streaming is no longer optional UX polish, it’s part of the product.

Stream Smarter with AnyAPI

Integrating SSE into your React frontend is one of the easiest, highest-impact improvements you can make to your AI product’s user experience. And if you're using multiple LLM providers, juggling streaming implementations becomes painful fast.

AnyAPI offers a unified interface for streaming across providers – OpenAI, Claude, Gemini, Mistral, and more – so your frontend code stays simple no matter what’s powering it.

Insights, Tutorials, and AI Tips

Explore the newest tutorials and expert takes on large language model APIs, real-time chatbot performance, prompt engineering, and scalable AI usage.

Discover how long-context AI models can power smarter assistants that remember, summarize, and act across long conversations.
Discover how long-context AI models can power smarter assistants that remember, summarize, and act across long conversations.
Discover how long-context AI models can power smarter assistants that remember, summarize, and act across long conversations.

Ready to Build with the Best Models? Join the Waitlist to Test Them First

Access top language models like Claude 4, GPT-4 Turbo, Gemini, and Mistral – no setup delays. Hop on the waitlist and and get early access perks when we're live.