Build production-ready voice AI agents with TypeScript and Next.js

Layercode is voice AI infrastructure for developers. We handle WebSockets, voice activity detection & global edge deployment. You focus on your agent's logic.

$npx @layercode/cli init

Initializing project configuration...

✓Created pipeline configuration

✓Generated API route template

✓Installed dependencies

Your voice agent is ready. Run npm run dev to start.

npx @layercode/cli init

Start Building

$100 free credits. No credit card required.

Read the docs

"Voice AI has unique infrastructure demands that traditional cloud architectures aren't built for. By leveraging Cloudflare, Layercode delivers the most performant and low-latency voice AI platform that scales."

Dane Knecht

CTO at Cloudflare

"Layercode makes it very easy to build and prototype low-latency voice features for our text-based agents built with NextJS and React."

Lance Jones

AI Agent Developer

Voice AI demos are easy.

Production is hard.

You built a working prototype in a weekend. But when real users start talking to your agent, everything breaks...

The agent lags and users talk over it

Turn-taking feels robotic

It mispronounces your customer's brand

Calls fail and you have no idea why

Scaling means rewriting your entire infrastructure

The gap between "cool demo" and production-ready voice AI can be months of work wrangling webSocket connections, voice activity detection tuning, global deployment, session recording, observability tooling, etc.

Layercode closes that gap.

Not a visual builder. Not a framework. Just infrastructure.

More control than Vapi or Retell

Visual workflow builders work until your logic gets complex. Then you're fighting the platform instead of building your product. Layercode gives you a webhook. Write TypeScript. Ship.

Simpler than LiveKit or Pipecat

Open-source frameworks give you control, but you're signing up for months of WebRTC, TURN servers, and audio pipeline debugging. Layercode handles the infrastructure. You handle the intelligence.

More flexible than OpenAI Realtime

Realtime LLM APIs are black boxes. You can't swap models mid-conversation, control prompts dynamically, or use your own fine-tuned LLM. Layercode calls YOUR backend. You control everything.

Add voice to your Next.js app in under 50 lines

Layercode's Node.js SDK integrates with the tools you already use. Here's a complete voice agent backend using the Vercel AI SDK:

app/api/agent/route.ts

import { createOpenAI } from "@ai-sdk/openai";
import { streamText } from "ai";
import { streamResponse } from "@layercode/node-server-sdk";

const openai = createOpenAI({ apiKey: process.env.OPENAI_API_KEY! });

export const POST = async (request: Request) => {
  const body = await request.json();

  return streamResponse(body, async ({ stream }) => {
    if (body.type === "message") {
      const { textStream } = streamText({
        model: openai("gpt-4o-mini"),
        system: "You are a helpful voice assistant.",
        messages: [{ role: "user", content: body.text }],
        onFinish: () => stream.end(),
      });

      await stream.ttsTextStream(textStream);
    }
  });
};

Works with the LLM libraries you already use:

Vercel AI SDK

OpenAI

Anthropic

LangChain

Ollama

Mastra

Hot-swap leading voice model providers:

ElevenLabs

Rime

Cartesia

Deepgram

Inworld

View quickstart

Your backend. Our infrastructure.

Layercode handles real-time audio streaming. You handle the conversation.

User speaks

Your user talks into their browser, phone, or mobile app. Layercode captures the audio stream at the nearest edge location and runs speech-to-text in real-time.

Your backend responds

We send transcribed text to your webhook. You process it with any LLM: OpenAI, Claude, Gemini, etc. Stream your response back via our SDK.

User hears response

Layercode converts your text to speech and streams audio back to the user. The entire round-trip happens in under a second.

You receive text, you send text. No audio processing, WebSocket management or VAD tuning.

OpenAI, Claude, Gemini, Llama, Mistral, etc. Use whatever model fits your use case.

Vercel, AWS, Railway, your own servers. Layercode connects to it via webhook.

Everything you need to ship production-ready voice AI agents

Hot-swap voice providers

Avoid vendor lock-in: Switch between Deepgram, ElevenLabs, Cartesia and Rime with a single config change. Test different models, optimize for cost or quality.

Analytics & Observability

Replay any conversation. Inspect latency breakdowns, and view transcripts to debug production issues.

Session recording

Every call is recorded automatically. Download audio files, export transcripts, build training datasets. All stored securely.

Per-second billing

Pay only for active conversation time. Silence is always free. No minimum commitments.

Web, mobile, and phone

Connect users via browser, iOS, Android, or phone. Same backend, same pipeline, multiple channels.

Unified billing

One invoice for speech-to-text, text-to-speech, and infrastructure.

Infrastructure

The first voice AI infrastructure built for low-latency conversations at global scale.

330+

Edge locations worldwide

<50ms

Audio processing

Zero

Cold starts

100%

Session isolation

Other voice AI platforms run on centralized cloud infrastructure. When your user is in Tokyo and your servers are in Virginia, latency kills the conversation. Pauses feel unnatural. Users talk over the agent. The experience can often fall apart.

Layercode is powered by Cloudflare's global edge network. We process audio at the location nearest to your user, not in a distant data center.

Users connect to the nearest edge location. Speech-to-text, voice activity detection, and audio streaming happen locally in milliseconds VS hundreds of milliseconds.

No capacity planning. No provisioning. Every conversation runs in its own isolated environment that scales automatically with demand.

Platform traffic spikes don't affect your users. Each session runs in complete isolation with dedicated resources.

Deploy once, serve users everywhere. No multi-region setup, no latency-based routing rules, no infrastructure headaches.

Enterprise-ready security

Layercode is built for production workloads with enterprise security requirements. Your data is encrypted in transit and at rest. Session recordings are stored securely in SOC 2 compliant infrastructure.

SOC 2 Type II*

GDPR Compliant

TLS 1.3

AES-256

Simple, predictable pricing

Per-second billing for active conversation time. Silence is free. STT, TTS, and infrastructure costs consolidated into one simple rate. Start with $100 in free credits, no credit card required.

View pricing details

Ship your first voice agent today

From zero to production-ready in minutes, not days. $100 in free credits to get started.

Start Building