Skip to main content

Metrics: Volume, Outcomes, and Hiring Intelligence

Track pipeline volume, completion rates, and the key question — do your AI scores predict who you actually hire?

Written by Crismin Joy Lagamayo

The Metrics section goes deeper than the Snapshot. While the Snapshot tells you "what's happening right now," Metrics answers "is this working?" — specifically, whether your AI interviews are generating useful hiring signals and where the pipeline is leaking.

Metrics has two tabs: General (operational health) and Outcome Analysis (hiring effectiveness).


General tab

The General tab tracks operational health — call volume, completion rates, dropout patterns, and how candidates move downstream after their interview.

Calls Started

Same definition as the Snapshot — interviews where a candidate connected. The key addition here is the trend comparison (e.g., "+28% vs prev period"), which tells you whether volume is growing, stable, or declining.

Volume / Completion chart

A timeline (hourly or daily) showing two series:

  • Volume (blue) — How many interviews started in each time bucket.

  • Completed (green) — How many of those reached the "finished" stage.

The gap between the two lines is your dropout rate. If the gap widens during certain hours, something about those time slots may be causing problems — phone connections, candidate demographics, or agent configuration changes.

Tip: Switch between Hourly and Daily views depending on what you're investigating. Hourly is useful for spotting time-of-day patterns (e.g., dropout spikes during lunch). Daily is better for week-over-week trends.

Call Status Distribution

A breakdown of every call attempt, including calls that were never answered:

Status

Meaning

🔇 Not Taken

No one picked up. The call was attempted, but the candidate did not answer.

✅ Completed

Candidate finished the full interview through the closing stage.

⚠️ Unfinished

Candidate connected but left before completing all stages.

📅 Rescheduled

The interview was postponed to a later time.

❌ Declined

Candidate explicitly refused to proceed.

Important: This chart includes Not Taken calls, while the "Calls Started" KPI does not. That's why the total here may be larger than Calls Started. "Calls Started" measures demand (candidates who engaged), while "Call Status Distribution" measures all outreach attempts.

A high Not Taken rate (>50%) usually means candidates aren't expecting the call, the timing is wrong, or the phone number looks unfamiliar. A high Declined rate suggests candidates are opting out — check whether the opening message is clear about what the call is and how long it takes.

Candidate Status Distribution

Shows how candidates are distributed across your recruitment pipeline statuses after their AI interview (e.g., Scheduled Recruiter Interview, Rejected, Pending Rejection). This tells you what happens to candidates downstream of TalkScore — useful for understanding whether scores are driving the right next-step decisions.

Duration histogram

Distribution of completed call lengths in time buckets (0–30s, 30–60s, 1–2m, 2–3m, 3–5m, 5–7m, 7–10m, 10m+).

What to look for:

  • A cluster at 0–30s usually means technical failures or immediate hangups — not real interviews.

  • If most calls are under 3 minutes, interviews may be too short to generate reliable scores. The agent might be rushing through questions.

  • If calls are consistently 10+ minutes, the agent may be too verbose or the question set may be too long for the role.


Outcome Analysis tab

This is where you answer the most important question: do AI scores predict who gets hired?

Outcome Analysis compares candidates who were ultimately hired against those who weren't, using their AI interview scores as the basis. For this to work, candidates need to have been classified (marked as hired or not hired) in the system.

Use at least 30 days of data for meaningful results. Small sample sizes (especially on the hired side) mean conclusions should be drawn cautiously.

Classified Candidates

Total candidates who have a hiring outcome recorded. The split (e.g., "40 hired / 1520 not hired") shows you the sample size.

Hired Percentage

What percentage of classified candidates were hired. This gives you the base rate against which all other predictiveness metrics are measured.

Score Predicts Hiring Outcome

The key predictiveness metric. It's calculated as:

(% of hired candidates scoring 4–5) minus (% of not-hired candidates scoring 4–5)

Result

What it means

Positive (e.g., +18%)

High-scoring candidates are more likely to be hired. The score is adding predictive value. The higher the number, the stronger the signal.

Near zero

Hired and not-hired candidates score similarly. The score isn't differentiating.

Negative

Not-hired candidates actually score higher than hired ones. The rubric may be measuring the wrong things — or hiring decisions are based on criteria the AI doesn't evaluate.

CEFR Predicts Hiring Outcome

The same concept applied to language proficiency. Are candidates with higher CEFR levels more likely to be hired?

Hired at B2 or Above

The percentage of hired candidates who achieved CEFR B2 (upper-intermediate) or higher. If this is low for a role that requires strong English, there may be a misalignment between the language threshold and your actual hiring standards.

Fluency Gap — Hired vs Not

The difference in average language fluency scores between hired and not-hired candidates. A wider gap means language skills are a meaningful hiring factor.

Biggest Score Gap

The soft skill dimension with the largest average score difference between hired and not-hired candidates. This is your strongest hiring signal — the skill that most differentiates who gets an offer.

Smallest Gap — Weakest Signal

The dimension where hired and not-hired candidates score most similarly. This skill isn't helping differentiate candidates and may be worth de-emphasizing or reconsidering in the rubric.

TalkScore Distribution — Hired vs Not Hired

A side-by-side histogram showing the score distribution for each group. Ideally, you want clear separation: hired candidates clustering at 4–5 and not-hired candidates spread lower. If the two distributions overlap heavily, the score isn't a useful filter.


Filters available

Both the General and Outcome Analysis tabs support these filters, which update all charts and metrics dynamically:

  • Date range — Last 7 days, Last 30 days, or a custom date range

  • Assessment — Filter to a specific AI agent or view all assessments

  • Campaign — Filter to a specific recruitment campaign or view all campaigns


Common workflows

Validating your scoring rubric

  1. Go to Metrics → Outcome Analysis and set the time range to at least 30 days.

  2. Check Score Predicts Hiring Outcome. Is it positive and meaningful (ideally >10%)?

  3. Look at the TalkScore Distribution chart. Is there visible separation between hired and not-hired?

  4. Check the Biggest Score Gap — this is the skill your rubric is best at measuring.

  5. Check the Smallest Gap — consider whether this skill should remain in the rubric or be replaced with something more predictive.

  6. If scores aren't predicting outcomes, contact your Talkpush representative with your findings — they can review and refine the scoring criteria.

Investigating low completion

  1. Go to Metrics → General.

  2. Check the Duration histogram — are many calls ending in under 60 seconds? Those may be technical failures, not real interviews.

  3. Look at the Volume / Completion chart — do completion dips correlate with specific times of day?

  4. Check Call Status Distribution — is the ratio of Unfinished to Completed unusually high?

  5. Go to Assessments → [agent] → Overview → Dropout by Stage to see exactly where candidates leave.

  6. Open a few incomplete candidate reports in Reports to read the transcripts and understand why candidates dropped out.


See also

For scoring consistency and rubric drift, see Score Calibration.


Did this answer your question?