The Insights tab is your quality control center — it checks whether the interviews themselves were good. The Reports tab is where you review individual candidates. This article walks through both.
Insights Tab
Insights has four tabs: Health, Feedback, Quality, and Alerts. Together, they give you a complete picture of agent behavior, scoring accuracy, and systemic issues.
Health tab
Health gives you the big-picture view of quality across all your AI agents. Three headline metrics sit at the top:
Metric | What it tells you |
With Issues (%) | Percentage of conversations that triggered at least one quality flag. One conversation can trigger multiple flags. |
With Bias (%) | Percentage of conversations where a bias or fairness flag was detected. Tracked separately because bias issues carry higher risk. |
Agents with Issues | How many active agents had at least one flag. If most agents are flagged, the problem is likely systemic. If only 1–2, it's specific to those agents. |
Below these, a ranked list shows every flag type with its count and category. The full flag taxonomy is at the end of this article — skip down to Flag taxonomy reference when you need to look one up.
Flags are assigned one of three severity levels:
Severity | What it means | What to do |
🚨 Critical | Requires immediate action. | Contact your Talkpush representative immediately. Do not dismiss without investigation. |
⚠️ Warning | Needs monitoring and investigation. | Review flagged candidate reports and monitor for recurrence. |
ℹ️ Info | A new or infrequent flag has appeared. | Monitor for recurrence. Escalate if the flag appears across multiple candidates. |
Important: Bias flags are always Critical severity. The nine bias types monitored include gender, racial, age, socioeconomic, disability, cultural/nationality, language proficiency, accent/dialect, and interview format bias. Even a single bias flag requires professional review — do not assume it is a false positive. Contact your Talkpush representative immediately.
When you see an alert — do's and don'ts
✅ Do | ❌ Don't |
Read alert details and review flagged reports | Dismiss Critical alerts without investigation |
Note alert type, severity, affected agent, time period | Assume bias flags are false positives |
Contact your Talkpush representative for Critical alerts | Tell candidates about quality flags or alerts |
Report recurring Warning or Info alerts | Attempt to change agent configuration yourself — contact your representative instead |
Below the flags, an Agent Performance table shows per-agent calls, completion rate, average score, std. deviation, average duration, and last call time. Use this to identify which specific agents are generating the most issues.
Feedback tab
Feedback captures human reviews of AI-generated scores. When a recruiter or QA reviewer looks at a candidate report and disagrees with (or confirms) the score, they can submit a Score Opinion.
Each entry shows the reviewer name, assessment, candidate, label (e.g., "Score Opinion"), comment (e.g., "overscored: Professionalism and Courtesy"), date, and status (New or Reviewed). Click Report to jump directly to the candidate's full assessment.
Why this matters: Human feedback grounds-truths the AI scoring. If reviewers consistently flag a particular skill as overscored, that's a strong signal to tighten the rubric criteria. If reviews consistently confirm scores, you have validation that the rubric is working well. Share patterns you spot with your Talkpush representative.
Quality tab
The Quality tab is a searchable, filterable log of every individual quality flag raised across all conversations. While Health shows aggregated counts, Quality lets you drill into specifics.
Filters available: time period, assessment (specific AI agent), campaign, category (Agent, Scoring, Technical, Candidate), and issue type (Hallucination, Repetition, etc.). Each row shows the date, candidate name and ID, campaign, category, issue type, and a plain-language explanation of what happened.
Use it to:
Investigate a specific flag type — filter by Hallucination to see all instances. Read the explanations to understand the pattern (placeholder values read aloud? fabricated job details?).
Audit a specific agent — filter by assessment to see all flags for one agent. Repeated flags point to a systemic configuration issue.
Track improvement after a fix — filter by date to confirm a flag stopped appearing after a system prompt or rubric change.
Alerts tab
Alerts surfaces automated findings across Agent, Scoring, Health, and Quality categories. Alerts are generated automatically based on configurable thresholds — for example, when hallucination rates exceed a set percentage, or when bias flags appear above a defined frequency.
To receive immediate email notifications for Critical and Warning flags, go to Settings → Notifications and enable Critical Flag Alerts.
Reports Tab
The Reports tab shows every candidate who has been through a TalkScore AI interview. You can search and filter the list, export to CSV, and click into any candidate to see their complete assessment.
Quick stats
Above the table, summary cards give you an at-a-glance read of the candidates matching your filters:
Card | What it shows |
Total Reports | Count of reports matching your filters. |
Avg Score | Mean overall score across listed candidates. |
Score 4–5 | Count and percentage of high-performing candidates. |
Score 0–2 | Count of candidates who scored low (also shows how many scored exactly 3). |
Browsing the report list
Filter / Control | What it does |
Live Data toggle | Updates the list in real time as new interviews come in. Useful for monitoring live campaigns. |
Time period | Filter by Last 7 days, Last 30 days, or a custom date range. |
Status filter | Show only completed calls, or filter by score range or CEFR level. |
Assessment / Campaign filters | Narrow results to a specific AI agent or recruitment campaign. |
Test Calls toggle | Show or hide internal test calls. Off by default so only real candidate data shows. |
Export CSV | Download the full report list as a spreadsheet for offline analysis or sharing. |
Reading a candidate report
Click any candidate row to open their full assessment. The report is organized into sections:
Section | What it contains |
Header | Name, assessment, overall score, date, duration, completion status, candidate ID, email, phone number, assessment agent. |
Interview Recording | Full MP3 audio player. Listen to the actual conversation alongside the transcript. |
Transcript | Complete conversation with speaker labels (agent name and candidate initials), timestamps, and turn count. This is the ground truth — always check it when reviewing or questioning a score. |
Per-Dimension Scores | Each soft skill scored 0–5 with a paragraph of AI reasoning that cites specific transcript evidence. The AI doesn't just assign a number — it explains its reasoning with direct references to what the candidate said. |
Data Extraction | Structured information automatically pulled from the conversation: candidate feedback summary, eligibility answers (work authorization, age, drug screen consent), rehire status, onsite training preference, and whether the candidate rejected the AI interview at any point. |
Sentiment Analysis | Overall sentiment summary describing how engagement evolved, per-question sentiment scores (1–5 per question showing tone shifts), and an overall sentiment shift (more positive, more negative, or neutral). Sentiment can surface concerns that scores miss. |
Candidate Questions | Questions the candidate asked during the interview, with timestamp, stage (e.g., "closing"), exact question text, and context. Often a signal of engagement. |
Agent Quality Analysis | AI review of the agent's behavior in this specific call — overall assessment plus each individual flag with severity, timestamp, exact transcript quote, and a plain-language explanation. |
Interpreting the TalkScore
The overall TalkScore is the average of individual soft skill dimension scores on a 0–5 scale. Score thresholds vary by client and role — confirm with your Talkpush representative if you're unsure what constitutes a pass for your specific assessment.
Score Range | General interpretation |
4–5 | Strong match. Candidate performed well across configured criteria. |
3–4 | Partial match. Review the per-dimension breakdown and transcript before deciding. |
0–3 | Did not meet criteria. Check for call quality issues (very short call, technical problems) before concluding. |
Note: Do not rely on the overall score alone. The per-dimension breakdown, sentiment analysis, Agent Quality Analysis, and transcript together give a much richer picture of each candidate.
CEFR in reports
If your assessment includes language evaluation, each candidate receives a CEFR level from A1 (beginner) to C2 (near-native). Each report includes per-dimension language scores for Grammar, Fluency, Vocabulary, Pronunciation, and Comprehension (each on a 0–10 scale), plus a plain-language explanation of the candidate's English proficiency. CEFR scores are evaluated independently from soft skill scores.
Submitting score feedback
If you believe a score is incorrect after reviewing the full report, use the Feedback feature to submit a Score Opinion. Note which dimension seems wrong and why. Feedback appears in Insights → Feedback and helps the Talkpush team identify patterns for rubric refinement.
Common workflows
Investigating a hallucination spike
Go to Insights → Health and note the hallucination count.
Switch to the Quality tab and filter by Issue Type: Hallucination.
Read the explanations for the most recent flags. Look for the pattern:
Placeholder values read aloud (e.g., "This is a None, None Remote CSR role") — The agent's data fields aren't populated correctly.
Fabricated details (e.g., inventing a salary or schedule) — The agent's configuration needs stricter guardrails.
Misquoted job details — The agent's configuration contains outdated information.
Share your findings with your Talkpush representative, including the affected agent name, time period, and the pattern you identified.
Monitor the Quality log over the next few days to confirm hallucination flags stop appearing after the fix.
Weekly quality review
Go to Insights → Health, set time filter to "Last 7 days."
Check the three headline metrics. Is "With Issues" trending up or down?
Look at the top 3 flag types. Are they the same as last week, or are new issues emerging?
Go to Feedback and check for new Score Opinions. Any patterns in what's being overscored or underscored?
If any agent has a disproportionate number of flags, open its assessment to review the configuration.
Reviewing a flagged candidate
Navigate from Insights → Quality (or from a notification email) to the candidate's report.
Read the Agent Quality Analysis section first to understand what was flagged.
Check the timestamp on the flag and find that moment in the Transcript.
Listen to the Recording at that timestamp to hear the actual exchange.
Review the Per-Dimension Scores — did the quality issue affect scoring? (e.g., if the agent hallucinated job details, the candidate may have responded based on wrong information.)
Decide whether the candidate needs to be re-interviewed or the score manually adjusted.
Using reports for rubric refinement
Go to Reports and sort or filter to find candidates who scored at the extremes (0–2 or 5).
Open 3–5 reports from each extreme.
Read the AI reasoning for each per-dimension score. Does it make sense? Is a "5" truly excellent, or is the bar too low?
Compare candidates who scored 4 vs. 5 — can you tell the difference from the transcript? If not, the rubric criteria for those levels may need to be more specific.
Share findings with your Talkpush representative — they can refine the scoring rubric based on the patterns you've identified.
Flag taxonomy reference
All quality flags fall into three categories. Use this as a lookup when you see a flag and want to understand exactly what it means.
Agent Quality flags (12)
Issues with the AI agent's behavior during the conversation.
Flag | What it means |
Hallucination | The agent stated something factually incorrect or read placeholder values (like "None") as real job details. The most serious agent flag. |
Repetition | The agent repeated the same question or phrase multiple times. |
Intent Misunderstood | The agent misinterpreted the candidate's response and replied with something irrelevant. |
Abrupt End | The conversation ended suddenly without proper closing. |
Vague Answers | The agent accepted vague candidate responses without probing for detail. |
Technical Error | A system issue occurred during the interview (audio, connection, processing). |
Question Ignored | The agent skipped or didn't address a question the candidate asked. |
User Confusion | The candidate appeared confused by the agent's instructions or questions. |
Off-Topic Deviation | The agent strayed from the interview script to discuss irrelevant topics. |
Leading Questions | The agent asked questions that suggested the desired answer. |
Incomplete Evaluation | The agent ended the interview without covering all required assessment areas. |
Unprofessional Tone | The agent used informal, rude, or unprofessional language. |
Scoring flags (3)
Contradictions or gaps in scoring data.
Flag | What it means |
Score Completion Mismatch | High score on an incomplete call, or low score on a completed one. |
High Score Incomplete | High overall score despite the call not being completed. |
Missing Score Completed | A completed interview has no scores — the scoring pipeline may have failed. |
Candidate Behavior flags (2)
Flag | What it means |
Requested Escalation | The candidate asked to speak to a human recruiter. |
Used Profanity | The candidate used inappropriate language. |
Specific alert types
Alert type | What it means |
Bias flag spike | Multiple bias flags detected in a short period (e.g., 6 incidents in 7 days). Always Critical — contact your Talkpush representative immediately. |
Score compression | High percentage of candidates receiving the same or very similar scores. A calibration issue, not a candidate quality issue. |
Score completion mismatch | Scores were generated for calls that did not fully complete, or completed calls are missing scores. |
Agent loop / Repetition | The agent got stuck repeating the same question or phrase. |
Lack of empathy | The agent failed to acknowledge or respond appropriately to a candidate's emotional state. |
Poor tone / Unprofessional tone | The agent used language that was informal, dismissive, or inappropriate for a professional interview. |
Need to troubleshoot an issue?
For step-by-step help with missing reports, score concerns, Critical alerts, score compression, exporting data, and other common questions, see the FAQ and Troubleshooting article.





