Cekura

Automated QA for Voice AI and Chat AI agents

1.9K followers

Automated QA for Voice AI and Chat AI agents

1.9K followers

Visit website

AI Metrics and Evaluation

Cekura enables Conversational AI teams to automate QA across the entire agent lifecycle—from pre-production simulation and evaluation to monitoring of production calls. We also support seamless integration into CI/CD pipelines, ensuring consistent quality and reliability at every stage of development and deployment.

This is the 3rd launch from Cekura. View more

Cekura

Launched this week

Observe and analyze your voice and chat AI agents

Out-of-the-box 30+ predefined metrics for analysis on CX, accuracy, conversation and voice quality. Compile perfect LLM judges by annotating just ~20 conversations and auto-improve in Cekura labs. Real-time, segmented dashboards to identify trends in Conversational AI. Smart statistical alerts so that you get notified only when metrics shift from historical baselines. Automated system pings to catch silent production failures.

Free Options

Launch tags:SaaS•Developer Tools•Audio

Launch Team / Built With

AppSignal — Get full visibility into app health, errors, and performance

Get full visibility into app health, errors, and performance

Promoted

Cekura

Maker

📌

Hi Product Hunt! 👋

We are excited to launch Cekura Monitoring for Voice and Chat AI companies. Most monitoring tools tell you if your AI is up. Cekura tells you if it is behaving.

When we had first launched Cekura QA, we thought we had solved the problem for both testing and monitoring . But as our users scaled, we noticed a painful pattern: While pre-production QA was automated, teams were still spending dozens of hours manually listening to thousands of calls.

The two big blockers we saw were:

The Scaling Wall: Defining and optimizing custom metrics was taking too long, forcing teams back into manual spot-checks.
Production Blindspot: Standard LLM metrics misses the Customer Experience in Voice AI - things like agent tone and customer sentiment that actually defines customer success.

We have rebuilt the monitoring layer from the ground up to solve this. Cekura Monitoring turns that "wall of noisy logs" into actionable signals.

🚀 What’s New in Cekura Monitoring:

30+ Predefined Metric Suite: We track what actually breaks Voice and Chat agents across four critical categories:
- Speech Quality: Voice clarity, pronunciation, and gibberish detection.
- Conversational Flow: Silences, interruptions (barge-ins), and termination triggers.
- Accuracy & Logic: Hallucinations, transcription accuracy, and relevancy.
- Customer Experience: CSAT, Sentiment analysis, and drop-off points.
Metric Optimizer: Stop "vibes-based" prompt engineering. Define a metric (e.g., Successful User Authentication), tag 20 calls in our Labs interface, and our optimizer "compiles" a prompt that aligns with your specific feedback.
Statistical Intelligence: No more fixed, noisy thresholds. Our Alerting Engine learns your agent's baseline and only pings Slack when metrics shift 2σ from historical norms.
Automated Cron Jobs: Set up recurring health checks to simulate production conversations. Catch silent failures and regressions before your customers do.
Visual Dashboards: Real-time distribution charts for each metric. Views customized for each stakeholder

Who is this for?

Teams scaling Voice & Chat AI who are tired of listening to calls manually and need a way to prove their agents are actually working.

Sign up and try for free at cekura.ai or drop your questions below! We would love to hear how you’re currently handling Voice and Chat AI in production👇

Report

7d ago

@kabra_sidhant Many congratulations on the launch, Sidhant! I've been tracking it since the Vocera days, it's evolved impressively and keeps getting better. Thrilled to see the buzz in voice AI communities especially on Reddit. Onwards and upwards! :)

Report

22h ago

Cekura

Maker

@rohanrecommends Thanks Rohan - also for all your guidance on best practices for Product Hunt Launches!

Report

21h ago

DaoLens

How are you different from tracing platforms like Braintrust and Galileo ? Except Voice metrics.

Report

1d ago

Cekura

Maker

@nimishg We are E2E conversational AI QA - Some of the big differences:

We run E2E multi turn simulations instead of trace level logging
These platforms does not offer Metric optimizer - without metric optimizer, it takes huge time to fine tune LLM-as-a-judge metrics
We also offer replay of production conversations to ensure the fix is incorporated.

In short we are very deep and verticalized in Conversational AI evals - they are more horizontal general agentic AI evals platforms

Report

1d ago

Cekura

Maker

@nimishg Braintrust/galileo are very horizontal for all llm agents. We are specialised for conversations, our UI, Metrics, dashboards are highly specialised for conversations.

Report

1d ago

What aspects of voice does it capture? I wanted to test on tonality and personality of my voice agent, is it achievable?

Report

1d ago

Cekura

Maker

@pratyush1505 We have voice clairty, gibberishness as a metric to capture the voice aspect of the agent

Report

1d ago

Cekura

Maker

@pratyush1505 For testing the personality of the agent, you can also checkout the Customer Satisfaction (CSAT) and Sentiment metrics

Report

1d ago

Cekura

Maker

@pratyush1505 you can also use voice clarity metric which will check how clear the voice is

Report

24h ago

The "is it behaving" vs "is it up" distinction is spot on. We've had AI chat agents pass every health check while giving completely wrong answers to customers. Uptime metrics are useless if the AI is confidently hallucinating.

How granular does the sentiment tracking get? Like can it detect when an agent starts being passive aggressive or gives a technically correct but unhelpful response? That's the stuff that kills user trust slowly.

Report

1d ago

Cekura

Maker

@mihir_kanzariya We are currently building turn level sentiment tracking - should be live in a week's time. Currently it gives overall sentiment score but granular feedback on where sentiment turned negative.

We have a metric called relevancy which test whether the agent response is relevant to the user question or not

Report

1d ago

Cekura

Maker

@mihir_kanzariya Sentiment analysis can be made as specific as you want. Our pre-defined metric has 3 states: neutral, positive, negative. But it is very seamless to tune this metric and have many other states. You should be able to create a highly accurate custom metric within 5 mins

Report

1d ago

This is something we've been looking for. We deploy voice and chat AI agents for businesses (support, qualification, scheduling) and QA has always been the manual bottleneck — listening to call recordings, checking if the agent followed the script, catching edge cases.

The 30+ predefined metrics and CI/CD integration is exactly what's needed to ship agent updates with confidence. Do you support Vapi-based voice agents out of the box, or does it require custom integration?

Report

1d ago

Cekura

Maker

@ksagachev Yes, Vapi is supported out of the box, no custom integration needed. Takes <5 min to setup.

Report

1d ago

Cekura

Maker

@ksagachev We have a very deep integration with vapi. It should feel seemless

Report

24h ago

Cekura

Maker

@ksagachev We have a native integration with Vapi for sending production conversations, tool calls and to run outbound simulations automatically

Report

21h ago

⁠Is the metrics customizable ? For example I need to define success criteria by peak latency and not mean latency

Report

1d ago

Cekura

Maker

@rishav_mishra3 Yes, Cekura is modular in a way that lets you go from full automation to full control, depending on your needs.

One of our key features is Python based metrics with access to all processed data, so you can measure exactly what you care about, like peak latency instead of mean latency. We also support defining your own success criteria using a flexible rubric style configuration.

Report

1d ago

Cekura

Maker

@rishav_mishra3 yes they are customisable. We expose the code of our latency metric which you can customise to get peak latency instead.

Report

1d ago

The silent production failure detection is what catches my eye. When you're running AI agents in prod, the scariest failures are the ones where nothing errors out - it just gives bad output for days without anyone noticing. Curious how Cekura handles the baseline drift problem - do you need a human to label 'good' vs 'bad' outputs, or does it pick that up automatically?

Report

1d ago

Cekura

Maker

@mykola_kondratiuk Human labelling is recommended for any metric you define - you label only 20 calls in our optimizer to ensure the LLM-as-a-judge covers all the edge cases

Report

1d ago

20 calls to bootstrap the judge is surprisingly low - that's actually pretty approachable for most teams. The LLM-as-judge approach makes sense for scale once you've got those calibration samples.

Report

1d ago

Cekura

Maker

@mykola_kondratiuk Human labelling help fine tune the metric and make it highly accurate for the good/bad identification. And at scale this metric then goes on and evaluate 1000s of calls with very high accuracy

Report

24h ago

Right - the labelling bootstraps the judge, then the judge scales. Makes sense as a two-phase approach.

Report

23h ago

Cekura

Maker

@mykola_kondratiuk Exactly!

Report

23h ago

glad it landed well. good luck with the launch!

Report

22h ago

1 2 3

•••

Previous Cekura Launches

CekuraLaunch reliable voice & chat AI agents 10x faster

Launched on June 24th, 2025

VoceraLaunch voice agents faster with simulation & monitoring

Launched on November 13th, 2024

Forum Threads

p/vocera

•

1d ago

LLM-as-a-judge based monitoring is not enough for Voice AI

Most teams scaling Voice AI think they can monitor quality with a simple LLM prompt. They are wrong.

An LLM can t hear a "crunchy" voice line, it can t accurately measure a 500ms "barge-in," and it struggles with the nuances of true conversational flow.

When we built Cekura Monitoring, we realized we had to go beyond the LLM. We combined Heuristic and Statistical models with our Metric Optimizer to solve the "Scaling Wall."

View all

Hi Product Hunt! 👋

We are excited to launch Cekura Monitoring for Voice and Chat AI companies. Most monitoring tools tell you if your AI is up. Cekura tells you if it is behaving.

The two big blockers we saw were:

The Scaling Wall: Defining and optimizing custom metrics was taking too long, forcing teams back into manual spot-checks.

Production Blindspot: Standard LLM metrics misses the Customer Experience in Voice AI - things like agent tone and customer sentiment that actually defines customer success.

We have rebuilt the monitoring layer from the ground up to solve this. Cekura Monitoring turns that "wall of noisy logs" into actionable signals.

🚀 What’s New in Cekura Monitoring:

30+ Predefined Metric Suite: We track what actually breaks Voice and Chat agents across four critical categories:

Speech Quality: Voice clarity, pronunciation, and gibberish detection.
Conversational Flow: Silences, interruptions (barge-ins), and termination triggers.
Accuracy & Logic: Hallucinations, transcription accuracy, and relevancy.
Customer Experience: CSAT, Sentiment analysis, and drop-off points.

Metric Optimizer: Stop "vibes-based" prompt engineering. Define a metric (e.g., Successful User Authentication), tag 20 calls in our Labs interface, and our optimizer "compiles" a prompt that aligns with your specific feedback.

Statistical Intelligence: No more fixed, noisy thresholds. Our Alerting Engine learns your agent's baseline and only pings Slack when metrics shift 2σ from historical norms.

Automated Cron Jobs: Set up recurring health checks to simulate production conversations. Catch silent failures and regressions before your customers do.

Visual Dashboards: Real-time distribution charts for each metric. Views customized for each stakeholder

Who is this for?

Teams scaling Voice & Chat AI who are tired of listening to calls manually and need a way to prove their agents are actually working.

Sign up and try for free at cekura.ai or drop your questions below! We would love to hear how you’re currently handling Voice and Chat AI in production👇