Roxels/ docs
embed

Headless mode

Headless mode is the way to build a voice experience that feels native to your product. Roxels still owns the call — microphone capture, audio playback, the LiveKit connection, the language models, the agent. You own everything visible: the orb, the transcript, the controls, the layout, the brand.

This page is the single source of truth for headless. If you're integrating Roxels into a product, share this page with your engineers.

What you control vs what Roxels controls

You Roxels
Every pixel of UI Microphone capture
When the call starts and ends Audio playback (the agent's voice)
The orb / activity indicator The LiveKit room and reconnection
The transcript view The conversation itself (LLM, prompts, goals)
Mute, pause, screenshare buttons Screen-share track capture
Errors shown to the user Tool calls, extraction, outputs

You never write WebRTC code. You never handle audio buffers. You don't manage tokens or reconnection. You drive the call through a clean controller API and subscribe to events.

You'll need

  • A Roxels account at app.roxels.ai.
  • A template, created in the dashboard.
  • An embed key (rk_…), domain-allowlisted for the origins where headless will run.
  • A page on your site where you can add a <script> tag.
  • Microphone permissions configured on your host — see Microphone permissions.

The 30-second example

<script src="https://app.roxels.ai/embed.js"></script>
<button id="start">Start conversation</button>
<button id="mute">Mute</button>
<button id="end">End</button>
<div id="status">Idle</div>
 
<script>
  let call = null;
 
  document.querySelector("#start").addEventListener("click", () => {
    call = Roxels.start({
      templateKey: "rk-your-embed-key",
      display: "none",
      personName: "Ada Lovelace",
      externalId: "user_123",
    });
 
    call.on("status", ({ status }) => {
      document.querySelector("#status").textContent = status;
    });
    call.on("chat", (msg) => console.log("chat:", msg));
    call.on("understanding", (data) => console.log("captured:", data));
    call.on("complete", (results) => console.log("done:", results));
    call.on("error", (err) => console.error("error:", err));
  });
 
  document.querySelector("#mute").addEventListener("click", () => call?.toggleMic());
  document.querySelector("#end").addEventListener("click", () => call?.hangUp());
</script>

That's a fully working headless voice call: start, mute, end, plus live status + transcript + extraction output. The rest of this page is detail on every method, every event, and patterns for production use.

The controller

Roxels.start({ display: "none", ... }) returns a controller synchronously. The call hasn't connected yet — the controller is a handle you bind to immediately.

const call = Roxels.start({ display: "none", templateKey: "rk-..." });
// call is the controller — bind event handlers now, even before connection.

The controller exposes:

  • State: call.state, call.sessionId, call.ready (a Promise that resolves when connected).
  • Subscriptions: call.on(event, cb), call.off(event, cb).
  • Commands: pause, resume, togglePause, muteMic, unmuteMic, setMicEnabled, toggleMic, startScreenShare, stopScreenShare, toggleScreenShare, sendMessage, uploadFile, hangUp.
  • Lifecycle: call.close() to destroy and clean up the iframe.

Awaiting connection

const call = Roxels.start({ display: "none", templateKey: "rk-..." });
try {
  await call.ready;
  // call is now connected; commands are guaranteed to land.
} catch (err) {
  // failed to connect (e.g. microphone_blocked, invalid template key)
}

If you call a command before the connection lands (e.g. call.muteMic() immediately after start()), the command is queued and dispatched once the call is connected. You don't need to gate every command on await call.ready — only do it when you want to know connection succeeded.

Commands — every method

All command methods return synchronously and are safe to call before the connection is live. They're queued and replayed once connected.

Conversation control

call.pause(); // pause the call (mic muted, agent silent)
call.resume(); // resume from paused
call.togglePause();
call.hangUp(); // end the conversation; emits `ended` then `complete`

Microphone

call.muteMic();
call.unmuteMic();
call.setMicEnabled(true | false);
call.toggleMic();

The microphone state is reflected by the mic_state event so multiple parts of your UI can stay in sync.

Screen-share

call.startScreenShare();
call.stopScreenShare();
call.toggleScreenShare();

The screen-share state is reflected by the screenshare_state event. The agent receives frames in real time and can reference what it sees. See Conversation lifecycle for what the agent does with screen-share.

Chat

call.sendMessage("Skip that for now");

Sends a text message into the conversation. The agent treats it like the user said it aloud — useful for accessibility, for users in environments where they can't talk, or as a hidden control channel for your UI.

File upload

call.uploadFile(file); // a single File object
call.uploadFile({ files }); // multiple Files

The file is transferred to the iframe and uploaded to your Roxels org's attachment store. The agent is notified when the file is ready and can reference it in the conversation.

Generic escape hatch

If you need to send a command type that isn't in the standard set — for example, a template-specific signal the agent listens for — use:

call.send({ type: "your_custom_signal", payload: { ... } });

The agent receives this on a dedicated data-channel topic and can pattern-match it in the template's instructions.

Events — every signal

Subscribe with call.on(event, callback). Unsubscribe with call.off(event, callback). Most callbacks receive a single argument; some receive none.

Lifecycle

Event When it fires Payload
session The session id is known. Fires once after start(). { sessionId }
status The connection status changes. { status } where status is one of prewarm, loading, ready, joining, connected, disconnecting, ended, error
connected The call has connected for the first time. none
ending The call is about to end (graceful close in progress). none
ended The call has ended. none
complete The conversation completed and final results are available. { summary, findings, duration_seconds, external_id, auto_close }
error A call-level error occurred. { error } (string code — see below)

Voice and chat

Event When it fires Payload
voice_state The agent's voice activity changes. { state } where state is one of idle, listening, thinking, speaking
chat A chat message arrives (either direction). { id, role, text, timestamp } (role is user or assistant)

Extraction and goals

Event When it fires Payload
understanding The agent extracted structured data, or a goal updated. { data, goal_id, cumulative, source, timestamp } or { text, ... } for unstructured
goal_transition A goal commits, or the active goal changes. { from_goal, to_goal, timestamp }

Document / form view

If the template has a live document or form, these fire as it updates.

Event When it fires Payload
document The document state updates. The current document state
document_clear A document was cleared. { doc_id }

Controls reflecting state back

Event When it fires Payload
mic_state The mic enabled/disabled state changed (from any source — your code, the user, the agent). { enabled }
screenshare_state The screen-share active state changed. { active }
paused The call paused. none
resumed The call resumed. none
paused_ended The call ended while paused. none
file_uploaded A file you uploaded finished processing. { id, filename }
command_ack The iframe acknowledged a command. Useful for debugging. { command }

A complete catalog with sample payloads is in Events.

Patterns

React (Next.js or any React app)

import { useEffect, useRef, useState } from "react";
 
export function VoiceCall({ templateKey }: { templateKey: string }) {
  const callRef = useRef<any>(null);
  const [status, setStatus] = useState("idle");
  const [voiceState, setVoiceState] = useState("idle");
 
  useEffect(() => {
    // Load embed.js once if you haven't via a <Script> tag in _app.
    const call = window.Roxels.start({
      templateKey,
      display: "none",
    });
    callRef.current = call;
 
    call.on("status", ({ status }) => setStatus(status));
    call.on("voice_state", ({ state }) => setVoiceState(state));
 
    return () => {
      // Clean up on unmount — never leak a live call.
      call.close();
    };
  }, [templateKey]);
 
  return (
    <div>
      <Orb state={voiceState} />
      <button onClick={() => callRef.current?.toggleMic()}>Mute</button>
      <button onClick={() => callRef.current?.hangUp()}>End</button>
      <div>Status: {status}</div>
    </div>
  );
}

Key React rules:

  • Always call.close() on unmount. A leaked call keeps the microphone hot and the LiveKit room alive.
  • Don't recreate the call on every render. Use useRef for the controller and useEffect (with the right dependencies) for the start.
  • For Next.js, load embed.js once via <Script src="https://app.roxels.ai/embed.js" strategy="afterInteractive" /> in the root layout, then access window.Roxels from your components.

Vue

<script setup>
import { onMounted, onUnmounted, ref } from "vue";
 
const props = defineProps({ templateKey: String });
const call = ref(null);
const status = ref("idle");
 
onMounted(() => {
  call.value = window.Roxels.start({
    templateKey: props.templateKey,
    display: "none",
  });
  call.value.on("status", ({ status: s }) => (status.value = s));
});
 
onUnmounted(() => call.value?.close());
</script>
 
<template>
  <div>Status: {{ status }}</div>
</template>

Plain JS, multi-call

You can run more than one headless call simultaneously. Each gets its own controller and its own iframe. Don't share state between controllers.

const call1 = Roxels.start({ templateKey: "rk-a", display: "none" });
const call2 = Roxels.start({ templateKey: "rk-b", display: "none" });
// independent; bind events separately.

Resuming a session

Pass externalId consistently for the same user. If the template has session persistence enabled, the next call from the same user resumes their previous conversation (transcript, captured data, document state).

Roxels.start({
  templateKey: "rk-your-embed-key",
  display: "none",
  externalId: "user_123", // stable identifier
});

The full identity model — when to trust external_id, when to verify server-side, the deletion path — is in Identity and resumption.

Server-side session creation

If you don't want to ship an embed key to the browser, create the session server-side using the REST API and pass the sessionId to start():

// sessionId came from your backend via fetch()
Roxels.start({ sessionId, display: "none" });

This is the path when you want to encode trusted context (the participant's verified identity, a customer record, anything the user shouldn't be able to tamper with) before the call begins.

Error handling

The error event fires with a string code:

Code What it means What to do
microphone_blocked The browser denied microphone access. Show your "we need the mic" UI; link to Microphone permissions if the host config is the cause.
session_create_failed Couldn't create the session (bad embed key, domain not allowlisted, server error). Check the key, the domain allowlist, and try again.
connection_failed Couldn't connect to the LiveKit room. Network issue; retry.
session_ended_unexpectedly The session ended outside the normal complete path. Surface a soft "we lost the connection" message; consider auto-retry.

call.ready rejects with the same error code if the failure happens before connected.

const call = Roxels.start({ display: "none", templateKey: "rk-..." });
call.on("error", ({ error }) => {
  if (error === "microphone_blocked") {
    showMicBlockedHelp();
  }
});
try {
  await call.ready;
} catch (err) {
  console.error("Failed to connect:", err);
}

Lifecycle in one diagram

   start()


   prewarm ──► loading ──► ready ──► joining ──► connected ◄──┐
                                                     │         │
                       voice_state, chat, understanding,        │ (call lives here;
                       goal_transition, document, ...           │  events flow freely)
                                                     │         │
                                       (user/your code)        │
                                                     │         │
                              hangUp() / agent ends ─┘         │
                                                     │         │
                                                     ▼         │
                                                  ending ──────┘


                                                   ended


                                                  complete

The complete event carries the final result:

{
  summary: string,          // human-readable summary of the conversation
  findings: object,         // structured data from all committed goals
  duration_seconds: number,
  external_id: string,
  auto_close: boolean,      // template setting — should your UI auto-dismiss?
}

Teardown checklist

When you're done with a call:

  1. Call call.close() (or call.hangUp() for a graceful conversation end).
  2. Drop your references to the controller.
  3. Don't reuse a controller after close() — start a fresh one if you need another call.

close() is destructive: it ends the call, removes the iframe, stops the microphone, releases the LiveKit room. It's the right call on unmount, on tab close, on hard error.

Performance and resource use

  • Idle cost: loading embed.js adds a small bundle to your page. Until you call start(), no iframe is mounted, no microphone is touched, no network is opened.
  • Active cost: an active call uses microphone, network (WebRTC + data channel), and a hidden iframe (~tens of MB of memory).
  • One iframe per call: if you start three concurrent calls, you mount three iframes.

For most products, this is fine. If you're embedding in an environment with strict memory budgets, run one call at a time and close it when done.

Security

  • Domain allowlist on the embed key. Embed keys (rk_…) only work from the origins you allowlist. A leaked key from a public site can only be used from that site.
  • Server-side session creation when context is sensitive. If the agent needs to know who the user is in a way the user shouldn't control, create the session server-side and pass the sessionId to the embed. The user can't tamper with context you encoded server-side.
  • Microphone permission is user-gated. The browser always prompts the user. No silent microphone access is possible.

Webhooks vs frontend callbacks

understanding, goal_transition, and complete events surface data to your JS. Many integrations want the same data to also land in their backend — usually for the canonical record, for triggering downstream systems, or as a backup.

The pattern is:

  • JS events for in-page reactivity (showing extracted data live, advancing your UI as goals commit).
  • Webhooks for backend record-keeping and triggering downstream work.

Both can fire for the same goal. See Webhooks overview for setup.