embed

Headless mode

Headless mode is the way to build a voice experience that feels native to your product. Roxels still owns the call — microphone capture, audio playback, the LiveKit connection, the language models, the agent. You own everything visible: the orb, the transcript, the controls, the layout, the brand.

This page is the single source of truth for headless. If you're integrating Roxels into a product, share this page with your engineers.

What you control vs what Roxels controls

You	Roxels
Every pixel of UI	Microphone capture
When the call starts and ends	Audio playback (the agent's voice)
The orb / activity indicator	The LiveKit room and reconnection
The transcript view	The conversation itself (LLM, prompts, goals)
Mute, pause, screenshare buttons	Screen-share track capture
Errors shown to the user	Tool calls, extraction, outputs

You never write WebRTC code. You never handle audio buffers. You don't manage tokens or reconnection. You drive the call through a clean controller API and subscribe to events.

You'll need

A Roxels account at app.roxels.ai.
A template, created in the dashboard.
An embed key (rk_…), domain-allowlisted for the origins where headless will run.
A page on your site where you can add a <script> tag.
Microphone permissions configured on your host — see Microphone permissions.

The 30-second example

<script src="https://app.roxels.ai/embed.js"></script>
<button id="start">Start conversation</button>
<button id="mute">Mute</button>
<button id="end">End</button>
<div id="status">Idle</div>
 
<script>
  let call = null;
 
  document.querySelector("#start").addEventListener("click", () => {
    call = Roxels.start({
      templateKey: "rk-your-embed-key",
      display: "none",
      personName: "Ada Lovelace",
      externalId: "user_123",
    });
 
    call.on("status", ({ status }) => {
      document.querySelector("#status").textContent = status;
    });
    call.on("chat", (msg) => console.log("chat:", msg));
    call.on("understanding", (data) => console.log("captured:", data));
    call.on("complete", (results) => console.log("done:", results));
    call.on("error", (err) => console.error("error:", err));
  });
 
  document.querySelector("#mute").addEventListener("click", () => call?.toggleMic());
  document.querySelector("#end").addEventListener("click", () => call?.hangUp());
</script>

That's a fully working headless voice call: start, mute, end, plus live status + transcript + extraction output. The rest of this page is detail on every method, every event, and patterns for production use.

The controller

Roxels.start({ display: "none", ... }) returns a controller synchronously. The call hasn't connected yet — the controller is a handle you bind to immediately.

const call = Roxels.start({ display: "none", templateKey: "rk-..." });
// call is the controller — bind event handlers now, even before connection.

The controller exposes:

State: call.state, call.sessionId, call.ready (a Promise that resolves when connected).
Subscriptions: call.on(event, cb), call.off(event, cb).
Commands: pause, resume, togglePause, muteMic, unmuteMic, setMicEnabled, toggleMic, startScreenShare, stopScreenShare, toggleScreenShare, sendMessage, uploadFile, hangUp.
Lifecycle: call.close() to destroy and clean up the iframe.

Awaiting connection

const call = Roxels.start({ display: "none", templateKey: "rk-..." });
try {
  await call.ready;
  // call is now connected; commands are guaranteed to land.
} catch (err) {
  // failed to connect (e.g. microphone_blocked, invalid template key)
}

If you call a command before the connection lands (e.g. call.muteMic() immediately after start()), the command is queued and dispatched once the call is connected. You don't need to gate every command on await call.ready — only do it when you want to know connection succeeded.

Commands — every method

All command methods return synchronously and are safe to call before the connection is live. They're queued and replayed once connected.

Conversation control

call.pause(); // pause the call (mic muted, agent silent)
call.resume(); // resume from paused
call.togglePause();
call.hangUp(); // end the conversation; emits `ended` then `complete`

Microphone

call.muteMic();
call.unmuteMic();
call.setMicEnabled(true | false);
call.toggleMic();

The microphone state is reflected by the mic_state event so multiple parts of your UI can stay in sync.

call.startScreenShare();
call.stopScreenShare();
call.toggleScreenShare();

The screen-share state is reflected by the screenshare_state event. The agent receives frames in real time and can reference what it sees. See Conversation lifecycle for what the agent does with screen-share.

Chat

call.sendMessage("Skip that for now");

Sends a text message into the conversation. The agent treats it like the user said it aloud — useful for accessibility, for users in environments where they can't talk, or as a hidden control channel for your UI.

File upload

call.uploadFile(file); // a single File object
call.uploadFile({ files }); // multiple Files

The file is transferred to the iframe and uploaded to your Roxels org's attachment store. The agent is notified when the file is ready and can reference it in the conversation.

Generic escape hatch

If you need to send a command type that isn't in the standard set — for example, a template-specific signal the agent listens for — use:

call.send({ type: "your_custom_signal", payload: { ... } });

The agent receives this on a dedicated data-channel topic and can pattern-match it in the template's instructions.

Events — every signal

Subscribe with call.on(event, callback). Unsubscribe with call.off(event, callback). Most callbacks receive a single argument; some receive none.

Lifecycle

Event	When it fires	Payload
`session`	The session id is known. Fires once after `start()`.	`{ sessionId }`
`status`	The connection status changes.	`{ status }` where status is one of `prewarm`, `loading`, `ready`, `joining`, `connected`, `disconnecting`, `ended`, `error`
`connected`	The call has connected for the first time.	none
`ending`	The call is about to end (graceful close in progress).	none
`ended`	The call has ended.	none
`complete`	The conversation completed and final results are available.	`{ summary, findings, duration_seconds, external_id, auto_close }`
`error`	A call-level error occurred.	`{ error }` (string code — see below)

Voice and chat

Event	When it fires	Payload
`voice_state`	The agent's voice activity changes.	`{ state }` where state is one of `idle`, `listening`, `thinking`, `speaking`
`chat`	A chat message arrives (either direction).	`{ id, role, text, timestamp }` (role is `user` or `assistant`)

Extraction and goals

Event	When it fires	Payload
`understanding`	The agent extracted structured data, or a goal updated.	`{ data, goal_id, cumulative, source, timestamp }` or `{ text, ... }` for unstructured
`goal_transition`	A goal commits, or the active goal changes.	`{ from_goal, to_goal, timestamp }`

Document / form view

If the template has a live document or form, these fire as it updates.

Event	When it fires	Payload
`document`	The document state updates.	The current document state
`document_clear`	A document was cleared.	`{ doc_id }`

Controls reflecting state back

Event	When it fires	Payload
`mic_state`	The mic enabled/disabled state changed (from any source — your code, the user, the agent).	`{ enabled }`
`screenshare_state`	The screen-share active state changed.	`{ active }`
`paused`	The call paused.	none
`resumed`	The call resumed.	none
`paused_ended`	The call ended while paused.	none
`file_uploaded`	A file you uploaded finished processing.	`{ id, filename }`
`command_ack`	The iframe acknowledged a command. Useful for debugging.	`{ command }`

A complete catalog with sample payloads is in Events.

Patterns

React (Next.js or any React app)

import { useEffect, useRef, useState } from "react";
 
export function VoiceCall({ templateKey }: { templateKey: string }) {
  const callRef = useRef<any>(null);
  const [status, setStatus] = useState("idle");
  const [voiceState, setVoiceState] = useState("idle");
 
  useEffect(() => {
    // Load embed.js once if you haven't via a <Script> tag in _app.
    const call = window.Roxels.start({
      templateKey,
      display: "none",
    });
    callRef.current = call;
 
    call.on("status", ({ status }) => setStatus(status));
    call.on("voice_state", ({ state }) => setVoiceState(state));
 
    return () => {
      // Clean up on unmount — never leak a live call.
      call.close();
    };
  }, [templateKey]);
 
  return (
    <div>
      <Orb state={voiceState} />
      <button onClick={() => callRef.current?.toggleMic()}>Mute</button>
      <button onClick={() => callRef.current?.hangUp()}>End</button>
      <div>Status: {status}</div>
    </div>
  );
}

Key React rules:

Always call.close() on unmount. A leaked call keeps the microphone hot and the LiveKit room alive.
Don't recreate the call on every render. Use useRef for the controller and useEffect (with the right dependencies) for the start.
For Next.js, load embed.js once via <Script src="https://app.roxels.ai/embed.js" strategy="afterInteractive" /> in the root layout, then access window.Roxels from your components.

Vue

<script setup>
import { onMounted, onUnmounted, ref } from "vue";
 
const props = defineProps({ templateKey: String });
const call = ref(null);
const status = ref("idle");
 
onMounted(() => {
  call.value = window.Roxels.start({
    templateKey: props.templateKey,
    display: "none",
  });
  call.value.on("status", ({ status: s }) => (status.value = s));
});
 
onUnmounted(() => call.value?.close());
</script>
 
<template>
  <div>Status: {{ status }}</div>
</template>

Plain JS, multi-call

You can run more than one headless call simultaneously. Each gets its own controller and its own iframe. Don't share state between controllers.

const call1 = Roxels.start({ templateKey: "rk-a", display: "none" });
const call2 = Roxels.start({ templateKey: "rk-b", display: "none" });
// independent; bind events separately.

Resuming a session

Pass externalId consistently for the same user. If the template has session persistence enabled, the next call from the same user resumes their previous conversation (transcript, captured data, document state).

Roxels.start({
  templateKey: "rk-your-embed-key",
  display: "none",
  externalId: "user_123", // stable identifier
});

The full identity model — when to trust external_id, when to verify server-side, the deletion path — is in Identity and resumption.

Server-side session creation

If you don't want to ship an embed key to the browser, create the session server-side using the REST API and pass the sessionId to start():

// sessionId came from your backend via fetch()
Roxels.start({ sessionId, display: "none" });

This is the path when you want to encode trusted context (the participant's verified identity, a customer record, anything the user shouldn't be able to tamper with) before the call begins.

Error handling

The error event fires with a string code:

Code	What it means	What to do
`microphone_blocked`	The browser denied microphone access.	Show your "we need the mic" UI; link to Microphone permissions if the host config is the cause.
`session_create_failed`	Couldn't create the session (bad embed key, domain not allowlisted, server error).	Check the key, the domain allowlist, and try again.
`connection_failed`	Couldn't connect to the LiveKit room.	Network issue; retry.
`session_ended_unexpectedly`	The session ended outside the normal `complete` path.	Surface a soft "we lost the connection" message; consider auto-retry.

call.ready rejects with the same error code if the failure happens before connected.

const call = Roxels.start({ display: "none", templateKey: "rk-..." });
call.on("error", ({ error }) => {
  if (error === "microphone_blocked") {
    showMicBlockedHelp();
  }
});
try {
  await call.ready;
} catch (err) {
  console.error("Failed to connect:", err);
}

Lifecycle in one diagram

   start()
      │
      ▼
   prewarm ──► loading ──► ready ──► joining ──► connected ◄──┐
                                                     │         │
                       voice_state, chat, understanding,        │ (call lives here;
                       goal_transition, document, ...           │  events flow freely)
                                                     │         │
                                       (user/your code)        │
                                                     │         │
                              hangUp() / agent ends ─┘         │
                                                     │         │
                                                     ▼         │
                                                  ending ──────┘
                                                     │
                                                     ▼
                                                   ended
                                                     │
                                                     ▼
                                                  complete

The complete event carries the final result:

{
  summary: string,          // human-readable summary of the conversation
  findings: object,         // structured data from all committed goals
  duration_seconds: number,
  external_id: string,
  auto_close: boolean,      // template setting — should your UI auto-dismiss?
}

Teardown checklist

When you're done with a call:

Call call.close() (or call.hangUp() for a graceful conversation end).
Drop your references to the controller.
Don't reuse a controller after close() — start a fresh one if you need another call.

close() is destructive: it ends the call, removes the iframe, stops the microphone, releases the LiveKit room. It's the right call on unmount, on tab close, on hard error.

Performance and resource use

Idle cost: loading embed.js adds a small bundle to your page. Until you call start(), no iframe is mounted, no microphone is touched, no network is opened.
Active cost: an active call uses microphone, network (WebRTC + data channel), and a hidden iframe (~tens of MB of memory).
One iframe per call: if you start three concurrent calls, you mount three iframes.

For most products, this is fine. If you're embedding in an environment with strict memory budgets, run one call at a time and close it when done.

Security

Domain allowlist on the embed key. Embed keys (rk_…) only work from the origins you allowlist. A leaked key from a public site can only be used from that site.
Server-side session creation when context is sensitive. If the agent needs to know who the user is in a way the user shouldn't control, create the session server-side and pass the sessionId to the embed. The user can't tamper with context you encoded server-side.
Microphone permission is user-gated. The browser always prompts the user. No silent microphone access is possible.

Webhooks vs frontend callbacks

understanding, goal_transition, and complete events surface data to your JS. Many integrations want the same data to also land in their backend — usually for the canonical record, for triggering downstream systems, or as a backup.

The pattern is:

JS events for in-page reactivity (showing extracted data live, advancing your UI as goals commit).
Webhooks for backend record-keeping and triggering downstream work.

Both can fire for the same goal. See Webhooks overview for setup.