getting started

What is Roxels

Roxels is a platform for embedding voice conversations into your product. You describe what the conversation does once, in a template. Roxels runs that conversation any number of times, in any number of places. As the conversation progresses, structured data flows back to you.

This page gives you the conceptual map. Other pages go deep on each piece.

The three nouns

Almost everything in Roxels is one of three things.

Templates

A template is the design of a conversation. You author it once in the dashboard. It describes:

The greeting and tone.
The goals — what the agent is trying to learn, propose, or get the user to confirm.
The language, voice, and conversational posture.
The skills the agent has (sending email, looking things up, running screen-share walkthroughs, etc.).
The outputs — webhooks fired when goals commit, frontend callbacks, or chained API calls.

Templates are versioned and editable from the dashboard or the REST API.

Conversations (also called sessions or interviews)

A conversation is one run of a template. It has a participant, a transcript, a set of committed goal data, and a final result. Each conversation has its own URL, its own LiveKit room, and its own set of attempts at the template's outputs.

The terms "session" and "interview" appear in older identifiers (URL paths, API field names) for backwards compatibility. The user-facing term is conversation.

Outputs

An output is what Roxels delivers when a goal commits. The same goal can fan out to multiple outputs. The three delivery paths are:

Webhooks — Roxels POSTs structured data to your endpoint. See Webhooks.
Frontend callbacks — Your embed page receives a JS event with the data. See Events.
Chained API calls — Templated requests to upstream APIs (e.g. create a ticket in your CRM when an issue is captured).

The three integration shapes

Pick the shape that matches your product. You can mix them on the same template.

A <script> tag mounts a modal or compact panel. Roxels owns the UI. You drop it in, you receive structured results.

Read: Embed overview, Display modes.

Headless

Roxels still owns the call itself — microphone capture, audio playback, the LiveKit connection. You own every pixel of the UI and drive every control from your code. Mute, pause, screenshare, hang up, send messages, subscribe to events.

This is the path for a custom voice experience deeply integrated into your product.

Read: Headless mode.

Server-side via REST API

Create templates, start sessions, retrieve results — all from your backend. The browser is optional. Pair with webhooks for asynchronous delivery.

Read: REST API.

The agent

Inside every conversation is an agent — a voice-first assistant authored by your template. The agent:

Greets the participant.
Pursues the template's goals through natural conversation.
Calls tools (skills) when useful.
Optionally sees the participant's screen if you allow screen-share.
Commits structured data when a goal is satisfied.
Closes the conversation.

The agent isn't a single LLM — it's a coordinated set of models: a speaker (the one talking), an observer (an advisor watching for missed extractions), and short callouts for specific tasks like extraction quality and tool argument validation. You author the conversation; Roxels coordinates the models.

Where you go next

Want to embed it now? Quickstart.
Want to understand a conversation end to end? Conversation lifecycle.
Want to build a custom UI on top of Roxels? Headless mode.
Want to author templates with an AI assistant? MCP overview.