Roxels/ docs
webhooks

Retries and idempotency

When a webhook fails, Roxels retries — but only when retrying makes sense. This page explains what's retryable, what's fatal, and the one rule your endpoint must follow.

The one rule: deduplicate by Idempotency-Key

Every webhook attempt carries the same Idempotency-Key header for the same commit. Your endpoint must check this key and ignore duplicates.

async function handleWebhook(req, res) {
  const key = req.headers["idempotency-key"];
  if (await db.processed(key)) {
    return res.status(200).end(); // already handled — return 2xx
  }
  // ... do the work
  await db.markProcessed(key);
  return res.status(200).end();
}

Without this dedup, a network blip causes Roxels to retry, and you process the same commit twice. With it, retries are safe.

Retryable vs fatal

Roxels decides based on the response.

Response Treated as
2xx Success. Done.
4xx (except 408, 429) Fatal. Won't retry. Your endpoint said "this request was bad" — retrying won't help.
408 Request Timeout Retryable.
429 Too Many Requests Retryable. Respects Retry-After header if present.
5xx Retryable. Server-side error; might be transient.
Network error, DNS failure, timeout Retryable.

If you want a request to NEVER retry — e.g. validation failed and you've logged the bad data — return 400, 403, or 404.

Retry policy

Retryable failures back off with jitter. The exact schedule isn't documented for a reason: it changes as we tune it based on what works for real customers.

What you can rely on:

  • Retries happen over a window measured in minutes, not hours.
  • After the window, the attempt is marked failed and won't be retried automatically.
  • You can manually retry a failed delivery from the conversation's detail page in the dashboard.

What happens on permanent failure

If every retry fails, the conversation's outputs.webhooks entry shows status: "failed" with all attempts and their errors. The conversation itself continues — the agent keeps talking — and other outputs still fire.

You can:

  • Manually retry from the dashboard once your endpoint is back up.
  • Query the API for failed deliveries and act on them.
  • Subscribe to webhook-failed alerts (configure in the dashboard) so you find out before the customer does.

Failure learnings

Roxels watches webhook failures and accumulates learnings — short observations about what went wrong, attached to the template. Examples:

  • "POST body included a null for the email field; your API rejects null."
  • "The external_id interpolated as undefined because session context didn't include it."

These learnings are surfaced in the dashboard when you next edit the template, and they're shown to the AI assistants (MCP) when authoring templates so future configs avoid the same mistake.

Learnings are advisory — they don't change the template behavior automatically. You decide whether to adjust the template based on them.

Order of delivery

Roxels does not guarantee ordering across goals. If two goals commit at nearly the same time and both fire webhooks, you may receive them in either order. Design your endpoint to be order-independent.

Within a single goal, retries deliver in attempt order — but again, your dedup-by-Idempotency-Key rule handles this.

Timeouts

Each attempt has a per-request timeout (measured in seconds). If your endpoint doesn't respond within the timeout, the attempt is recorded as a timeout failure (retryable).

This means: your endpoint should respond fast, and do the heavy work asynchronously. A common pattern:

async function handleWebhook(req, res) {
  await db.insertJob(req.body); // fast write
  res.status(200).end(); // respond immediately
  // worker picks up the job and does the actual work
}

If you do the work synchronously and it takes 30+ seconds, Roxels times out and retries, and now you have a duplicate (which your dedup handles, but it's wasted work on both sides).

Replay protection

The X-Roxels-Signature header includes a timestamp. Reject attempts where the timestamp is more than ~5 minutes old. This protects you from replay attacks even if a webhook payload is somehow captured in transit.

Testing failure handling

From the dashboard, you can:

  • Send a test webhook to your URL (with a fixture payload).
  • View the last N delivery attempts for any output, with response status, body, and timing.
  • Manually retry a failed delivery.

Use these to verify your endpoint handles the unhappy paths before they happen in production.