Related papers: Can You Tell It's AI? Human Perception of Synthetic Voices in Vishing Scenarios

Can You Tell It's AI? Human Perception of Synthetic Voices in Vishing Scenarios

URL: http://arxiv.org/abs/2602.20061v1
Date: Mon, 23 Feb 2026 17:17:53 GMT
Title: Can You Tell It's AI? Human Perception of Synthetic Voices in Vishing Scenarios
Authors: Zoha Hayat Bhatti, Bakhtawar Ahtisham, Seemal Tausif, Niklas George, Nida ul Habib Bajwa, Mobin Javed,
Abstract summary: Large Language Models and commercial speech synthesis systems now enable highly realistic AI-generated voice scams (vishing)<n>Yet it remains unclear whether individuals can reliably distinguish AI-generated speech from human-recorded voices in realistic scam contexts.<n>We conducted a controlled online study in which 22 participants evaluated 16 vishing-style audio clips and classified each as human or AI.
Score: 3.2976205772213123
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models and commercial speech synthesis systems now enable highly realistic AI-generated voice scams (vishing), raising urgent concerns about deception at scale. Yet it remains unclear whether individuals can reliably distinguish AI-generated speech from human-recorded voices in realistic scam contexts and what perceptual strategies underlie their judgments. We conducted a controlled online study in which 22 participants evaluated 16 vishing-style audio clips (8 AI-generated, 8 human-recorded) and classified each as human or AI while reporting confidence. Participants performed poorly: mean accuracy was 37.5%, below chance in a binary classification task. At the stimulus level, misclassification was bidirectional: 75% of AI-generated clips were majority-labeled as human, while 62.5% of human-recorded clips were majority-labeled as AI. Signal Detection Theory analysis revealed near-zero discriminability (d' approx 0), indicating inability to reliably distinguish synthetic from human voices rather than simple response bias. Qualitative analysis of 315 coded excerpts revealed reliance on paralinguistic and emotional heuristics, including pauses, filler words, vocal variability, cadence, and emotional expressiveness. However, these surface-level cues traditionally associated with human authenticity were frequently replicated by AI-generated samples. Misclassifications were often accompanied by moderate to high confidence, suggesting perceptual miscalibration rather than uncertainty. Together, our findings demonstrate that authenticity judgments based on vocal heuristics are unreliable in contemporary vishing scenarios. We discuss implications for security interventions, user education, and AI-mediated deception mitigation.

Related papers

ADEPT: RL-Aligned Agentic Decoding of Emotion via Evidence Probing Tools -- From Consensus Learning to Ambiguity-Driven Emotion Reasoning [67.22219034602514]
We introduce ADEPT (Agentic Decoding of Emotion via Evidence Probing Tools), a framework that reframes emotion recognition as a multi-turn inquiry process.<n> ADEPT transforms an SLLM into an agent that maintains an evolving candidate emotion set and adaptively invokes dedicated semantic and acoustic probing tools.<n>We show that ADEPT improves primary emotion accuracy in most settings while substantially improving minor emotion characterization.
arXiv Detail & Related papers (2026-02-13T08:33:37Z)
Do AI Voices Learn Social Nuances? A Case of Politeness and Speech Rate [0.0]
This study investigates whether state-of-the-art text-to-speech systems have the human tendency to reduce speech rate to convey politeness.<n>We prompted 22 synthetic voices from two leading AI platforms to read a fixed script under both "polite and formal" and "casual and informal" conditions.<n>Across both AI platforms, the polite prompt produced slower speech than the casual prompt with very large effect sizes.
arXiv Detail & Related papers (2025-11-12T07:44:42Z)
Hallucination vs interpretation: rethinking accuracy and precision in AI-assisted data extraction for knowledge synthesis [0.9898534984111934]
We developed an extraction platform using large language models (LLMs) to automate data extraction.<n>We compared AI to human responses across 187 publications and 17 extraction questions from a published scoping review.<n>Findings suggest AI variability depends more on interpretability than hallucination.
arXiv Detail & Related papers (2025-08-13T03:33:30Z)
AI Debate Aids Assessment of Controversial Claims [73.8907110799657]
We study whether AI debate can guide biased judges toward the truth by having two AI systems debate opposing sides of controversial factuality claims.<n>In Study I, debate consistently improves human judgment accuracy and confidence calibration, outperforming consultancy.<n>In Study II, AI judges with human-like personas achieve even higher accuracy (78.5%) than human judges (70.1%) and default AI judges without personas (69.8%)<n>These findings highlight AI debate as a promising path toward scalable, bias-resilient oversight in contested domains.
arXiv Detail & Related papers (2025-06-02T19:01:53Z)
Almost AI, Almost Human: The Challenge of Detecting AI-Polished Writing [55.2480439325792]
This study systematically evaluations twelve state-of-the-art AI-text detectors using our AI-Polished-Text Evaluation dataset.<n>Our findings reveal that detectors frequently flag even minimally polished text as AI-generated, struggle to differentiate between degrees of AI involvement, and exhibit biases against older and smaller models.
arXiv Detail & Related papers (2025-02-21T18:45:37Z)
People are poorly equipped to detect AI-powered voice clones [12.3166714008126]
We report on the realism of AI-generated voices in terms of identity matching and naturalness.<n>We find human participants cannot consistently identify recordings of AI-generated voices.
arXiv Detail & Related papers (2024-10-03T21:26:58Z)
Human Bias in the Face of AI: Examining Human Judgment Against Text Labeled as AI Generated [48.70176791365903]
This study explores how bias shapes the perception of AI versus human generated content.<n>We investigated how human raters respond to labeled and unlabeled content.
arXiv Detail & Related papers (2024-09-29T04:31:45Z)
As Good As A Coin Toss: Human detection of AI-generated images, videos, audio, and audiovisual stimuli [0.0]
We conducted a perceptual study with 1276 participants to assess how capable people were at distinguishing between authentic and synthetic media.<n>We find that on average, people struggled to distinguish between synthetic and authentic media, with the mean detection performance close to a chance level performance of 50%.<n>We also find that accuracy rates worsen when the stimuli contain any degree of synthetic content, features foreign languages, and the media type is a single modality.
arXiv Detail & Related papers (2024-03-25T13:39:33Z)
Seeing is not always believing: Benchmarking Human and Model Perception of AI-Generated Images [66.20578637253831]
There is a growing concern that the advancement of artificial intelligence (AI) technology may produce fake photos. This study aims to comprehensively evaluate agents for distinguishing state-of-the-art AI-generated visual content.
arXiv Detail & Related papers (2023-04-25T17:51:59Z)
Fairness in AI and Its Long-Term Implications on Society [68.8204255655161]
We take a closer look at AI fairness and analyze how lack of AI fairness can lead to deepening of biases over time. We discuss how biased models can lead to more negative real-world outcomes for certain groups. If the issues persist, they could be reinforced by interactions with other risks and have severe implications on society in the form of social unrest.
arXiv Detail & Related papers (2023-04-16T11:22:59Z)
Can Machines Imitate Humans? Integrative Turing-like tests for Language and Vision Demonstrate a Narrowing Gap [56.611702960809644]
We benchmark AI's ability to imitate humans in three language tasks and three vision tasks.<n>Next, we conducted 72,191 Turing-like tests with 1,916 human judges and 10 AI judges.<n>Imitation ability showed minimal correlation with conventional AI performance metrics.
arXiv Detail & Related papers (2022-11-23T16:16:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.