craftflow.comFebruary 2026
Industry Research

AI Voice Agent Benchmark for Home Services

An evaluation of the four leading AI voice agent providers across 10,000 real production calls from HVAC, plumbing, and electrical companies.

February 202610,000 calls analyzed4 providers evaluated
Craft
Avoca
Broccoli
Netic

Most home services companies are either already using AI voice agents or actively evaluating them. The pitch is pretty simple: answer every call, qualify leads 24/7, book appointments without putting anyone on hold.

But the question we kept running into was whether these systems actually deliver, and whether there are real differences between providers. So we decided to find out. We pulled real call recordings from the four leading providers, each deployed at a live HVAC, plumbing, and/or electrical company with real customer traffic. These were not test calls or staged demos.

We transcribed everything with the same speech-to-text model and ran each transcript through the same scoring criteria. This report covers what we found across three areas: lead qualification, booking rate, and caller engagement.

Methodology

Data Collection

We randomly sampled recordings of real calls from real home service companies. The calls were sourced from different markets spanning the USA and Canada to account for any localized differences. How we acquired the data.

Transcription

Every recording went through AssemblyAI's Universal 3 Pro model with the same settings. We didn't do any manual cleanup or corrections to the transcripts.

Evaluation

We used the same LLM-as-a-judge scoring prompts on every transcript regardless of which provider handled the call. Same criteria, same thresholds, across the board.

Providers Evaluated

We chose these four because they're the providers with the most traction in home services right now. All four are actively deployed at enterprise-level HVAC, plumbing, and electrical companies, and all four have real call volume to evaluate against.

Craft logo
Craft →

AI agents and human copilots that engage at every step of the customer journey, including in the home, to maximize revenue from every opportunity.

TractionRecently emerged from stealth. Deployed at some of the largest home services enterprises.
FocusHome services
Avoca logo
Avoca →

AI engine that accelerates revenue across every channel and integrates seamlessly.

TractionFirst mover; larger, older customer base.
FocusHome services
Broccoli logo
Broccoli →

AI voice agents for trades businesses. Answers calls, books jobs, and handles customer follow-ups.

TractionGrowing presence among trades businesses.
FocusHome services
Netic logo
Netic →

Multi-channel customer service AI agent deployed across several service verticals.

TractionDeployed at ~30 businesses across multiple verticals.
FocusHorizontal - home services, consumer health, veterinary clinics

Lead Qualification Rate

What percentage of calls are actually real service opportunities? We're talking about a homeowner in the service area who has a real problem the company can fix. Appointment check-ins, vendor calls, wrong numbers, and similar were excluded. (See full criteria)

This is where the biggest gap showed up. Craft qualified 58.3% of calls, while the other three providers all clustered between 42% and 45%. That's a 16+ point spread between the top and bottom.

A big chunk of the gap comes down to caller engagement. Providers with lower engagement rates are losing potential leads before the conversation even starts. If customers hang up as soon as they hear the AI's greeting and find out its an AI then the engagement rate and qualification rate will both suffer.

Booking Rate

How often does a call actually end with a booked appointment? We looked at this two ways: the raw rate across all calls, and the rate among only the qualified calls. (See booking criteria)

Raw Booking Rate (% of all calls)

Craft booked appointments on 29.6% of all inbound calls, nearly 10 points ahead of the next closest provider. Broccoli and Avoca landed in the low 20s, and Netic trailed at 15.8%.

The spread here is pretty significant. At 1,000 calls a month, the difference between 15.8% and 29.6% is roughly 138 additional booked jobs. Even the gap between Craft and Broccoli (the second-highest) works out to about 72 extra appointments.

Qualified Booking Rate (% of qualified calls)

Looking at just the qualified calls, things shift around a bit. Broccoli actually had the highest qualified booking rate at 53.4%, with Craft close behind at 50.8% and Avoca at 46.0%. Netic came in lowest at 37.4%.

Broccoli's higher qualified conversion rate is interesting given that they had the lowest qualification rate overall. It likely comes down to how many calls they qualify in the first place. Fewer qualified calls means the ones that do get through tend to be stronger leads, which pushes the conversion rate up. But at the end of the day, raw booking rate is what shows up on your schedule.

Caller Engagement

Did callers actually stick around for a real conversation? We defined this as the transcript being over 500 characters, which roughly means the caller stayed on long enough for an actual back-and-forth. If someone calls and hangs up in the first few seconds, the AI never gets a chance.

Three out of four providers were in a similar range here, between 82% and 86%. Netic was the clear outlier at 72.2%, meaning close to 1 in 4 callers hung up before any real conversation happened.

This matters a lot because it puts a hard cap on everything else. You can't qualify or book someone who already hung up. The likely culprits are things like how the AI sounds when it first picks up, how fast it responds, and whether it sounds natural enough that people don't immediately bail.

Summary

MetricCraftAvocaBroccoliNetic
Calls Analyzed2,5002,5002,5002,500
Lead Qualification Rate58.3%45.1%41.9%42.2%
Booking Rate (Raw)29.6%20.7%22.4%15.8%
Booking Rate (Qualified)50.8%46.0%53.4%37.4%
Caller Engagement86.2%83.5%82.2%72.2%

Discussion

There are real, measurable differences between these four providers, even when you run them all through the same evaluation. The gaps aren't small either. On raw booking rate, the top provider booked nearly twice as many appointments as the bottom one.

Qualification rate had a big spread too, and it's probably the most upstream number. If your AI agent isn't identifying qualified leads, it doesn't really matter how good it is at everything else.

One interesting thing is how qualified booking rate tells a different story than raw booking rate. A provider can have a lower raw rate but higher conversion on qualified calls, which usually just means they're qualifying fewer calls overall. Both numbers matter, but raw booking rate is the one that directly translates to jobs on the board.

Engagement is the one that's easy to overlook. If callers are hanging up before the AI even gets going, you've lost them. The providers that keep people on the line longer just have more shots at qualifying and booking. Pretty simple math.

Scoring Definitions

Lead Qualification. A call counts as qualified if the caller is a homeowner (or decision-maker) in the service area with a real need the company can handle. We excluded appointment check-ins, sales/vendor calls, and wrong numbers.
Booking Rate (Raw). An appointment got booked during the call, with date, time, and service details confirmed. Calculated as a percentage of all calls.
Booking Rate (Qualified). Same as above, but calculated as a percentage of qualified calls only.
Caller Engagement. The transcript was over 500 characters, meaning the caller stayed on long enough for an actual conversation instead of hanging up right away.

Over 2,500 calls were evaluated per provider. All evaluations were run in February 2026.

Appendix

The exact prompts used to evaluate each transcript are published below for full transparency. These were run as-is on every call.

Transcription Prompt
This is a customer support call between an AI agent and a customer of a HVAC/plumbing/electrical company.

Mandatory: Transcribe any overlapping speech across channels including crosstalk.

Non-negotiable: Preserve all disfluencies exactly as spoken including verbal hesitations, restarts, and self-corrections (um, uh, I—I mean).

Occasionally the AI will transfer the call to the human. When this happens label the human CSR separately as "Human CSR".

Label the speaker as "AI" or "Human CSR" or "Customer".
Lead Qualification Prompt
INBOUND CALL QUALIFIER

TASK CONTEXT
A call center conversation was recorded and transcribed between a customer service representative from a residential HVAC, plumbing, electrical company, and a caller.

TASK ASSIGNMENT
Your job is to analyze a transcript of the phone call and determine whether the phone call represents a qualified customer or not.

QUALIFICATION CRITERIA
The caller is a homeowner or decision maker in the company's service area who has a qualified need for a new service that the company offers.

Here's a general outline for how to make qualification decisions:
For a call to be qualified, typically, the following will be true:
- This is a potential customer, who appears to have a qualified need that the company could solve
- They do NOT already have an existing appointment or service booked
- They own their home or have decision-making authority for the property
- They are in the service area
- The customer has a need for a service which the company provides, but that need cannot be met purely due to lack of immediate or adequate availability from the company, then it is still qualified.
- A customer objecting to schedule an appointment, for whatever reason, on its own, is not disqualifying

DISQUALIFICATION CRITERIA
- The caller already has an appointment or service scheduled and is calling to discuss it
- The caller is calling to accept an open estimate and schedule work
- The caller is calling to discuss their membership / maintenance agreement
- The caller is unsatisfied with a recent service from the company and is calling to complain or schedule another visit to fix the issue
- The caller is calling to pay or discuss a bill
- The caller is trying to market or sell something to the company
- The caller is a job applicant
- The caller called the wrong number on accident
- The call hang up before the caller said anything
Booking Determination Prompt
BOOKING DETERMINATION

TASK CONTEXT
A call center conversation was recorded and transcribed between an HVAC, plumbing, electrical company and a caller. The call was answered by an AI Agent on behalf of the company. It may have been transferred at some point along the way.

TASK ASSIGNMENT
Your job is to analyze a transcript of the phone call and determine whether the call resulted in a booked appointment or not.

BOOKING CRITERIA
The agent must gather necessary details and book a specific appointment on the call. If it is not in the transcript it does not count.

NON-BOOKINGS
The following are not considered bookings:
- Taking a message for someone else to follow up
- Handing off to an on call technician or emergency services to follow up

How We Acquired the Data

Craft's platform works across the call center and in the home, so many of our customers also run a competing voice agent alongside us. That's how we got the recordings. The companies we work with gave us permission to use their call data for this research. Every recording is a real production call between a real customer and the AI. Nothing was staged or simulated.

About Craft Labs

Craft Labs is an AI research and development organization focused on conversational AI for service industries. The team includes engineers who worked on self-driving cars at Tesla, researchers from MIT, and machine learning engineers from leading technology companies.

Craft's AI agents drive revenue growth across the entire customer journey from initial contact to in-home to follow up.

See how Craft performs on your calls

This research was conducted and published by Craft Labs.