The Metrics section goes deeper than the Snapshot. While the Snapshot tells you "what's happening right now," Metrics answers "is this working?" — specifically, whether your AI interviews are generating useful hiring signals and where the pipeline is leaking.
Metrics has two tabs: General (operational health) and Outcome Analysis (hiring effectiveness).
General tab
The General tab tracks operational health — call volume, completion rates, dropout patterns, and how candidates move downstream after their interview.
Calls Started
Same definition as the Snapshot — interviews where a candidate connected. The key addition here is the trend comparison (e.g., "+28% vs prev period"), which tells you whether volume is growing, stable, or declining.
Volume / Completion chart
A timeline (hourly or daily) showing two series:
Volume (blue) — How many interviews started in each time bucket.
Completed (green) — How many of those reached the "finished" stage.
The gap between the two lines is your dropout rate. If the gap widens during certain hours, something about those time slots may be causing problems — phone connections, candidate demographics, or agent configuration changes.
Tip: Switch between Hourly and Daily views depending on what you're investigating. Hourly is useful for spotting time-of-day patterns (e.g., dropout spikes during lunch). Daily is better for week-over-week trends.
Call Status Distribution
A breakdown of every call attempt, including calls that were never answered:
Status | Meaning |
🔇 Not Taken | No one picked up. The call was attempted, but the candidate did not answer. |
✅ Completed | Candidate finished the full interview through the closing stage. |
⚠️ Unfinished | Candidate connected but left before completing all stages. |
📅 Rescheduled | The interview was postponed to a later time. |
❌ Declined | Candidate explicitly refused to proceed. |
Important: This chart includes Not Taken calls, while the "Calls Started" KPI does not. That's why the total here may be larger than Calls Started. "Calls Started" measures demand (candidates who engaged), while "Call Status Distribution" measures all outreach attempts.
A high Not Taken rate (>50%) usually means candidates aren't expecting the call, the timing is wrong, or the phone number looks unfamiliar. A high Declined rate suggests candidates are opting out — check whether the opening message is clear about what the call is and how long it takes.
Candidate Status Distribution
Shows how candidates are distributed across your recruitment pipeline statuses after their AI interview (e.g., Scheduled Recruiter Interview, Rejected, Pending Rejection). This tells you what happens to candidates downstream of TalkScore — useful for understanding whether scores are driving the right next-step decisions.
Duration histogram
Distribution of completed call lengths in time buckets (0–30s, 30–60s, 1–2m, 2–3m, 3–5m, 5–7m, 7–10m, 10m+).
What to look for:
A cluster at 0–30s usually means technical failures or immediate hangups — not real interviews.
If most calls are under 3 minutes, interviews may be too short to generate reliable scores. The agent might be rushing through questions.
If calls are consistently 10+ minutes, the agent may be too verbose or the question set may be too long for the role.
Outcome Analysis tab
This is where you answer the most important question: do AI scores predict who gets hired?
Outcome Analysis compares candidates who were ultimately hired against those who weren't, using their AI interview scores as the basis. For this to work, candidates need to have been classified (marked as hired or not hired) in the system.
Use at least 30 days of data for meaningful results. Small sample sizes (especially on the hired side) mean conclusions should be drawn cautiously.
Classified Candidates
Total candidates who have a hiring outcome recorded. The split (e.g., "40 hired / 1520 not hired") shows you the sample size.
Hired Percentage
What percentage of classified candidates were hired. This gives you the base rate against which all other predictiveness metrics are measured.
Score Predicts Hiring Outcome
The key predictiveness metric. It's calculated as:
(% of hired candidates scoring 4–5) minus (% of not-hired candidates scoring 4–5)
Result | What it means |
Positive (e.g., +18%) | High-scoring candidates are more likely to be hired. The score is adding predictive value. The higher the number, the stronger the signal. |
Near zero | Hired and not-hired candidates score similarly. The score isn't differentiating. |
Negative | Not-hired candidates actually score higher than hired ones. The rubric may be measuring the wrong things — or hiring decisions are based on criteria the AI doesn't evaluate. |
CEFR Predicts Hiring Outcome
The same concept applied to language proficiency. Are candidates with higher CEFR levels more likely to be hired?
Hired at B2 or Above
The percentage of hired candidates who achieved CEFR B2 (upper-intermediate) or higher. If this is low for a role that requires strong English, there may be a misalignment between the language threshold and your actual hiring standards.
Fluency Gap — Hired vs Not
The difference in average language fluency scores between hired and not-hired candidates. A wider gap means language skills are a meaningful hiring factor.
Biggest Score Gap
The soft skill dimension with the largest average score difference between hired and not-hired candidates. This is your strongest hiring signal — the skill that most differentiates who gets an offer.
Smallest Gap — Weakest Signal
The dimension where hired and not-hired candidates score most similarly. This skill isn't helping differentiate candidates and may be worth de-emphasizing or reconsidering in the rubric.
TalkScore Distribution — Hired vs Not Hired
A side-by-side histogram showing the score distribution for each group. Ideally, you want clear separation: hired candidates clustering at 4–5 and not-hired candidates spread lower. If the two distributions overlap heavily, the score isn't a useful filter.
Filters available
Both the General and Outcome Analysis tabs support these filters, which update all charts and metrics dynamically:
Date range — Last 7 days, Last 30 days, or a custom date range
Assessment — Filter to a specific AI agent or view all assessments
Campaign — Filter to a specific recruitment campaign or view all campaigns
Common workflows
Validating your scoring rubric
Go to Metrics → Outcome Analysis and set the time range to at least 30 days.
Check Score Predicts Hiring Outcome. Is it positive and meaningful (ideally >10%)?
Look at the TalkScore Distribution chart. Is there visible separation between hired and not-hired?
Check the Biggest Score Gap — this is the skill your rubric is best at measuring.
Check the Smallest Gap — consider whether this skill should remain in the rubric or be replaced with something more predictive.
If scores aren't predicting outcomes, contact your Talkpush representative with your findings — they can review and refine the scoring criteria.
Investigating low completion
Go to Metrics → General.
Check the Duration histogram — are many calls ending in under 60 seconds? Those may be technical failures, not real interviews.
Look at the Volume / Completion chart — do completion dips correlate with specific times of day?
Check Call Status Distribution — is the ratio of Unfinished to Completed unusually high?
Go to Assessments → [agent] → Overview → Dropout by Stage to see exactly where candidates leave.
Open a few incomplete candidate reports in Reports to read the transcripts and understand why candidates dropped out.
See also
For scoring consistency and rubric drift, see Score Calibration.



