Headless mode
Headless mode is the way to build a voice experience that feels native to your product. Roxels still owns the call — microphone capture, audio playback, the LiveKit connection, the language models, the agent. You own everything visible: the orb, the transcript, the controls, the layout, the brand.
This page is the single source of truth for headless. If you're integrating Roxels into a product, share this page with your engineers.
What you control vs what Roxels controls
| You | Roxels |
|---|---|
| Every pixel of UI | Microphone capture |
| When the call starts and ends | Audio playback (the agent's voice) |
| The orb / activity indicator | The LiveKit room and reconnection |
| The transcript view | The conversation itself (LLM, prompts, goals) |
| Mute, pause, screenshare buttons | Screen-share track capture |
| Errors shown to the user | Tool calls, extraction, outputs |
You never write WebRTC code. You never handle audio buffers. You don't manage tokens or reconnection. You drive the call through a clean controller API and subscribe to events.
You'll need
- A Roxels account at app.roxels.ai.
- A template, created in the dashboard.
- An embed key (
rk_…), domain-allowlisted for the origins where headless will run. - A page on your site where you can add a
<script>tag. - Microphone permissions configured on your host — see Microphone permissions.
The 30-second example
<script src="https://app.roxels.ai/embed.js"></script>
<button id="start">Start conversation</button>
<button id="mute">Mute</button>
<button id="end">End</button>
<div id="status">Idle</div>
<script>
let call = null;
document.querySelector("#start").addEventListener("click", () => {
call = Roxels.start({
templateKey: "rk-your-embed-key",
display: "none",
personName: "Ada Lovelace",
externalId: "user_123",
});
call.on("status", ({ status }) => {
document.querySelector("#status").textContent = status;
});
call.on("chat", (msg) => console.log("chat:", msg));
call.on("understanding", (data) => console.log("captured:", data));
call.on("complete", (results) => console.log("done:", results));
call.on("error", (err) => console.error("error:", err));
});
document.querySelector("#mute").addEventListener("click", () => call?.toggleMic());
document.querySelector("#end").addEventListener("click", () => call?.hangUp());
</script>That's a fully working headless voice call: start, mute, end, plus live status + transcript + extraction output. The rest of this page is detail on every method, every event, and patterns for production use.
The controller
Roxels.start({ display: "none", ... }) returns a controller synchronously. The call hasn't connected yet — the controller is a handle you bind to immediately.
const call = Roxels.start({ display: "none", templateKey: "rk-..." });
// call is the controller — bind event handlers now, even before connection.The controller exposes:
- State:
call.state,call.sessionId,call.ready(a Promise that resolves when connected). - Subscriptions:
call.on(event, cb),call.off(event, cb). - Commands:
pause,resume,togglePause,muteMic,unmuteMic,setMicEnabled,toggleMic,startScreenShare,stopScreenShare,toggleScreenShare,sendMessage,uploadFile,hangUp. - Lifecycle:
call.close()to destroy and clean up the iframe.
Awaiting connection
const call = Roxels.start({ display: "none", templateKey: "rk-..." });
try {
await call.ready;
// call is now connected; commands are guaranteed to land.
} catch (err) {
// failed to connect (e.g. microphone_blocked, invalid template key)
}If you call a command before the connection lands (e.g. call.muteMic() immediately after start()), the command is queued and dispatched once the call is connected. You don't need to gate every command on await call.ready — only do it when you want to know connection succeeded.
Commands — every method
All command methods return synchronously and are safe to call before the connection is live. They're queued and replayed once connected.
Conversation control
call.pause(); // pause the call (mic muted, agent silent)
call.resume(); // resume from paused
call.togglePause();
call.hangUp(); // end the conversation; emits `ended` then `complete`Microphone
call.muteMic();
call.unmuteMic();
call.setMicEnabled(true | false);
call.toggleMic();The microphone state is reflected by the mic_state event so multiple parts of your UI can stay in sync.
Screen-share
call.startScreenShare();
call.stopScreenShare();
call.toggleScreenShare();The screen-share state is reflected by the screenshare_state event. The agent receives frames in real time and can reference what it sees. See Conversation lifecycle for what the agent does with screen-share.
Chat
call.sendMessage("Skip that for now");Sends a text message into the conversation. The agent treats it like the user said it aloud — useful for accessibility, for users in environments where they can't talk, or as a hidden control channel for your UI.
File upload
call.uploadFile(file); // a single File object
call.uploadFile({ files }); // multiple FilesThe file is transferred to the iframe and uploaded to your Roxels org's attachment store. The agent is notified when the file is ready and can reference it in the conversation.
Generic escape hatch
If you need to send a command type that isn't in the standard set — for example, a template-specific signal the agent listens for — use:
call.send({ type: "your_custom_signal", payload: { ... } });The agent receives this on a dedicated data-channel topic and can pattern-match it in the template's instructions.
Events — every signal
Subscribe with call.on(event, callback). Unsubscribe with call.off(event, callback). Most callbacks receive a single argument; some receive none.
Lifecycle
| Event | When it fires | Payload |
|---|---|---|
session |
The session id is known. Fires once after start(). |
{ sessionId } |
status |
The connection status changes. | { status } where status is one of prewarm, loading, ready, joining, connected, disconnecting, ended, error |
connected |
The call has connected for the first time. | none |
ending |
The call is about to end (graceful close in progress). | none |
ended |
The call has ended. | none |
complete |
The conversation completed and final results are available. | { summary, findings, duration_seconds, external_id, auto_close } |
error |
A call-level error occurred. | { error } (string code — see below) |
Voice and chat
| Event | When it fires | Payload |
|---|---|---|
voice_state |
The agent's voice activity changes. | { state } where state is one of idle, listening, thinking, speaking |
chat |
A chat message arrives (either direction). | { id, role, text, timestamp } (role is user or assistant) |
Extraction and goals
| Event | When it fires | Payload |
|---|---|---|
understanding |
The agent extracted structured data, or a goal updated. | { data, goal_id, cumulative, source, timestamp } or { text, ... } for unstructured |
goal_transition |
A goal commits, or the active goal changes. | { from_goal, to_goal, timestamp } |
Document / form view
If the template has a live document or form, these fire as it updates.
| Event | When it fires | Payload |
|---|---|---|
document |
The document state updates. | The current document state |
document_clear |
A document was cleared. | { doc_id } |
Controls reflecting state back
| Event | When it fires | Payload |
|---|---|---|
mic_state |
The mic enabled/disabled state changed (from any source — your code, the user, the agent). | { enabled } |
screenshare_state |
The screen-share active state changed. | { active } |
paused |
The call paused. | none |
resumed |
The call resumed. | none |
paused_ended |
The call ended while paused. | none |
file_uploaded |
A file you uploaded finished processing. | { id, filename } |
command_ack |
The iframe acknowledged a command. Useful for debugging. | { command } |
A complete catalog with sample payloads is in Events.
Patterns
React (Next.js or any React app)
import { useEffect, useRef, useState } from "react";
export function VoiceCall({ templateKey }: { templateKey: string }) {
const callRef = useRef<any>(null);
const [status, setStatus] = useState("idle");
const [voiceState, setVoiceState] = useState("idle");
useEffect(() => {
// Load embed.js once if you haven't via a <Script> tag in _app.
const call = window.Roxels.start({
templateKey,
display: "none",
});
callRef.current = call;
call.on("status", ({ status }) => setStatus(status));
call.on("voice_state", ({ state }) => setVoiceState(state));
return () => {
// Clean up on unmount — never leak a live call.
call.close();
};
}, [templateKey]);
return (
<div>
<Orb state={voiceState} />
<button onClick={() => callRef.current?.toggleMic()}>Mute</button>
<button onClick={() => callRef.current?.hangUp()}>End</button>
<div>Status: {status}</div>
</div>
);
}Key React rules:
- Always
call.close()on unmount. A leaked call keeps the microphone hot and the LiveKit room alive. - Don't recreate the call on every render. Use
useReffor the controller anduseEffect(with the right dependencies) for the start. - For Next.js, load
embed.jsonce via<Script src="https://app.roxels.ai/embed.js" strategy="afterInteractive" />in the root layout, then accesswindow.Roxelsfrom your components.
Vue
<script setup>
import { onMounted, onUnmounted, ref } from "vue";
const props = defineProps({ templateKey: String });
const call = ref(null);
const status = ref("idle");
onMounted(() => {
call.value = window.Roxels.start({
templateKey: props.templateKey,
display: "none",
});
call.value.on("status", ({ status: s }) => (status.value = s));
});
onUnmounted(() => call.value?.close());
</script>
<template>
<div>Status: {{ status }}</div>
</template>Plain JS, multi-call
You can run more than one headless call simultaneously. Each gets its own controller and its own iframe. Don't share state between controllers.
const call1 = Roxels.start({ templateKey: "rk-a", display: "none" });
const call2 = Roxels.start({ templateKey: "rk-b", display: "none" });
// independent; bind events separately.Resuming a session
Pass externalId consistently for the same user. If the template has session persistence enabled, the next call from the same user resumes their previous conversation (transcript, captured data, document state).
Roxels.start({
templateKey: "rk-your-embed-key",
display: "none",
externalId: "user_123", // stable identifier
});The full identity model — when to trust external_id, when to verify server-side, the deletion path — is in Identity and resumption.
Server-side session creation
If you don't want to ship an embed key to the browser, create the session server-side using the REST API and pass the sessionId to start():
// sessionId came from your backend via fetch()
Roxels.start({ sessionId, display: "none" });This is the path when you want to encode trusted context (the participant's verified identity, a customer record, anything the user shouldn't be able to tamper with) before the call begins.
Error handling
The error event fires with a string code:
| Code | What it means | What to do |
|---|---|---|
microphone_blocked |
The browser denied microphone access. | Show your "we need the mic" UI; link to Microphone permissions if the host config is the cause. |
session_create_failed |
Couldn't create the session (bad embed key, domain not allowlisted, server error). | Check the key, the domain allowlist, and try again. |
connection_failed |
Couldn't connect to the LiveKit room. | Network issue; retry. |
session_ended_unexpectedly |
The session ended outside the normal complete path. |
Surface a soft "we lost the connection" message; consider auto-retry. |
call.ready rejects with the same error code if the failure happens before connected.
const call = Roxels.start({ display: "none", templateKey: "rk-..." });
call.on("error", ({ error }) => {
if (error === "microphone_blocked") {
showMicBlockedHelp();
}
});
try {
await call.ready;
} catch (err) {
console.error("Failed to connect:", err);
}Lifecycle in one diagram
start()
│
▼
prewarm ──► loading ──► ready ──► joining ──► connected ◄──┐
│ │
voice_state, chat, understanding, │ (call lives here;
goal_transition, document, ... │ events flow freely)
│ │
(user/your code) │
│ │
hangUp() / agent ends ─┘ │
│ │
▼ │
ending ──────┘
│
▼
ended
│
▼
completeThe complete event carries the final result:
{
summary: string, // human-readable summary of the conversation
findings: object, // structured data from all committed goals
duration_seconds: number,
external_id: string,
auto_close: boolean, // template setting — should your UI auto-dismiss?
}Teardown checklist
When you're done with a call:
- Call
call.close()(orcall.hangUp()for a graceful conversation end). - Drop your references to the controller.
- Don't reuse a controller after
close()— start a fresh one if you need another call.
close() is destructive: it ends the call, removes the iframe, stops the microphone, releases the LiveKit room. It's the right call on unmount, on tab close, on hard error.
Performance and resource use
- Idle cost: loading
embed.jsadds a small bundle to your page. Until you callstart(), no iframe is mounted, no microphone is touched, no network is opened. - Active cost: an active call uses microphone, network (WebRTC + data channel), and a hidden iframe (~tens of MB of memory).
- One iframe per call: if you start three concurrent calls, you mount three iframes.
For most products, this is fine. If you're embedding in an environment with strict memory budgets, run one call at a time and close it when done.
Security
- Domain allowlist on the embed key. Embed keys (
rk_…) only work from the origins you allowlist. A leaked key from a public site can only be used from that site. - Server-side session creation when context is sensitive. If the agent needs to know who the user is in a way the user shouldn't control, create the session server-side and pass the
sessionIdto the embed. The user can't tamper with context you encoded server-side. - Microphone permission is user-gated. The browser always prompts the user. No silent microphone access is possible.
Webhooks vs frontend callbacks
understanding, goal_transition, and complete events surface data to your JS. Many integrations want the same data to also land in their backend — usually for the canonical record, for triggering downstream systems, or as a backup.
The pattern is:
- JS events for in-page reactivity (showing extracted data live, advancing your UI as goals commit).
- Webhooks for backend record-keeping and triggering downstream work.
Both can fire for the same goal. See Webhooks overview for setup.
Read next
- JS API reference — Quick-lookup table for every method.
- Events — Every event with sample payloads.
- Microphone permissions — Required host config.
- Troubleshooting — Common issues.
- Identity and resumption — How participants persist across sessions.
- Webhooks overview — Backend delivery for the same data your JS callbacks see.
- Conversation lifecycle — How everything fits together end to end.