People are poorly equipped to detect AI-powered voice clones
- URL: http://arxiv.org/abs/2410.03791v2
- Date: Sat, 25 Jan 2025 01:18:23 GMT
- Title: People are poorly equipped to detect AI-powered voice clones
- Authors: Sarah Barrington, Emily A. Cooper, Hany Farid,
- Abstract summary: We report on the realism of AI-generated voices in terms of identity matching and naturalness.<n>We find human participants cannot consistently identify recordings of AI-generated voices.
- Score: 12.3166714008126
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As generative artificial intelligence (AI) continues its ballistic trajectory, everything from text to audio, image, and video generation continues to improve at mimicking human-generated content. Through a series of perceptual studies, we report on the realism of AI-generated voices in terms of identity matching and naturalness. We find human participants cannot consistently identify recordings of AI-generated voices. Specifically, participants perceived the identity of an AI-voice to be the same as its real counterpart approximately 80% of the time, and correctly identified a voice as AI generated only about 60% of the time.
Related papers
- Can You Tell It's AI? Human Perception of Synthetic Voices in Vishing Scenarios [3.2976205772213123]
Large Language Models and commercial speech synthesis systems now enable highly realistic AI-generated voice scams (vishing)<n>Yet it remains unclear whether individuals can reliably distinguish AI-generated speech from human-recorded voices in realistic scam contexts.<n>We conducted a controlled online study in which 22 participants evaluated 16 vishing-style audio clips and classified each as human or AI.
arXiv Detail & Related papers (2026-02-23T17:17:53Z) - Skyra: AI-Generated Video Detection via Grounded Artifact Reasoning [66.51617619673587]
We present Skyra, a specialized large language model (MLLM) that identifies human-perceivable visual artifacts in AI-generated videos.<n>To support this objective, we construct ViF-CoT-4K for Supervised Fine-Tuning (SFT), which represents the first large-scale AI-generated video dataset with fine-grained human annotations.<n>We then develop a two-stage training strategy that systematically enhances our model's artifact's-temporal perception, explanation capability, and detection accuracy.
arXiv Detail & Related papers (2025-12-17T18:48:26Z) - Do AI Voices Learn Social Nuances? A Case of Politeness and Speech Rate [0.0]
This study investigates whether state-of-the-art text-to-speech systems have the human tendency to reduce speech rate to convey politeness.<n>We prompted 22 synthetic voices from two leading AI platforms to read a fixed script under both "polite and formal" and "casual and informal" conditions.<n>Across both AI platforms, the polite prompt produced slower speech than the casual prompt with very large effect sizes.
arXiv Detail & Related papers (2025-11-12T07:44:42Z) - Assessing GPTZero's Accuracy in Identifying AI vs. Human-Written Essays [0.0]
GPTZero is the most-used AI detector, but its reliability in distinguishing human-authored texts is limited.<n>A vast majority of the AI-generated papers were detected accurately (ranging from 91-100% AI believed generation), while the human generated essays fluctuated.<n>These findings suggest that although GPTZero is effective at detecting purely AI-generated content, its reliability in distinguishing human-authored texts is limited.
arXiv Detail & Related papers (2025-06-30T04:53:27Z) - Almost AI, Almost Human: The Challenge of Detecting AI-Polished Writing [55.2480439325792]
Misclassification can lead to false plagiarism accusations and misleading claims about AI prevalence in online content.
We systematically evaluate eleven state-of-the-art AI-text detectors using our AI-Polished-Text Evaluation dataset.
Our findings reveal that detectors frequently misclassify even minimally polished text as AI-generated, struggle to differentiate between degrees of AI involvement, and exhibit biases against older and smaller models.
arXiv Detail & Related papers (2025-02-21T18:45:37Z) - "It was 80% me, 20% AI": Seeking Authenticity in Co-Writing with Large Language Models [97.22914355737676]
We examine whether and how writers want to preserve their authentic voice when co-writing with AI tools.
Our findings illuminate conceptions of authenticity in human-AI co-creation.
Readers' responses showed less concern about human-AI co-writing.
arXiv Detail & Related papers (2024-11-20T04:42:32Z) - Where are we in audio deepfake detection? A systematic analysis over generative and detection models [59.09338266364506]
SONAR is a synthetic AI-Audio Detection Framework and Benchmark.
It provides a comprehensive evaluation for distinguishing cutting-edge AI-synthesized auditory content.
It is the first framework to uniformly benchmark AI-audio detection across both traditional and foundation model-based detection systems.
arXiv Detail & Related papers (2024-10-06T01:03:42Z) - AI-rays: Exploring Bias in the Gaze of AI Through a Multimodal Interactive Installation [7.939652622988465]
We introduce AI-rays, an interactive installation where AI generates speculative identities from participants' appearance.
It uses speculative X-ray visions to contrast reality with AI-generated assumptions, metaphorically highlighting AI's scrutiny and biases.
arXiv Detail & Related papers (2024-10-03T18:44:05Z) - Human Bias in the Face of AI: The Role of Human Judgement in AI Generated Text Evaluation [48.70176791365903]
This study explores how bias shapes the perception of AI versus human generated content.
We investigated how human raters respond to labeled and unlabeled content.
arXiv Detail & Related papers (2024-09-29T04:31:45Z) - Measuring Human Contribution in AI-Assisted Content Generation [66.06040950325969]
This study raises the research question of measuring human contribution in AI-assisted content generation.
By calculating mutual information between human input and AI-assisted output relative to self-information of AI-assisted output, we quantify the proportional information contribution of humans in content generation.
arXiv Detail & Related papers (2024-08-27T05:56:04Z) - Self-Directed Turing Test for Large Language Models [56.64615470513102]
The Turing test examines whether AIs can exhibit human-like behaviour in natural language conversations.
Traditional Turing tests adopt a rigid dialogue format where each participant sends only one message each time.
This paper proposes the Self-Directed Turing Test, which extends the original test with a burst dialogue format.
arXiv Detail & Related papers (2024-08-19T09:57:28Z) - Navigating AI Fallibility: Examining People's Reactions and Perceptions of AI after Encountering Personality Misrepresentations [7.256711790264119]
Hyper-personalized AI systems profile people's characteristics to provide personalized recommendations.
These systems are not immune to errors when making inferences about people's most personal traits.
We present two studies to examine how people react and perceive AI after encountering personality misrepresentations.
arXiv Detail & Related papers (2024-05-25T21:27:15Z) - Unmasking Illusions: Understanding Human Perception of Audiovisual Deepfakes [49.81915942821647]
This paper aims to evaluate the human ability to discern deepfake videos through a subjective study.
We present our findings by comparing human observers to five state-ofthe-art audiovisual deepfake detection models.
We found that all AI models performed better than humans when evaluated on the same 40 videos.
arXiv Detail & Related papers (2024-05-07T07:57:15Z) - Generation Z's Ability to Discriminate Between AI-generated and
Human-Authored Text on Discord [0.32885740436059047]
Discord enables AI integrations, making their primarily "Generation Z" userbase particularly exposed to AI-generated content.
We surveyed Generation Z aged individuals to evaluate their proficiency in discriminating between AI-generated and human-authored text.
We find that Generation Z individuals are unable to discern between AI and human-authored text.
arXiv Detail & Related papers (2023-12-31T11:52:15Z) - The Self 2.0: How AI-Enhanced Self-Clones Transform Self-Perception and
Improve Presentation Skills [9.495191491787908]
This study explores the impact of AI-generated digital self-clones on improving online presentation skills.
We compared self-recorded videos (control) with self-clone videos (AI group) for English presentation practice.
Our findings recommend the ethical employment of digital self-clones to enhance the emotional and cognitive facets of skill development.
arXiv Detail & Related papers (2023-10-23T17:20:08Z) - Real-time Detection of AI-Generated Speech for DeepFake Voice Conversion [4.251500966181852]
This study consists of real human speech from eight well-known figures and their speech converted to one another using Retrieval-based Voice Conversion.
It is found that the Extreme Gradient Boosting model can achieve an average classification accuracy of 99.3% and can classify speech in real-time, at around 0.004 milliseconds given one second of speech.
arXiv Detail & Related papers (2023-08-24T12:26:15Z) - Seeing is not always believing: Benchmarking Human and Model Perception
of AI-Generated Images [66.20578637253831]
There is a growing concern that the advancement of artificial intelligence (AI) technology may produce fake photos.
This study aims to comprehensively evaluate agents for distinguishing state-of-the-art AI-generated visual content.
arXiv Detail & Related papers (2023-04-25T17:51:59Z) - Can Machines Imitate Humans? Integrative Turing-like tests for Language and Vision Demonstrate a Narrowing Gap [56.611702960809644]
We benchmark AI's ability to imitate humans in three language tasks and three vision tasks.<n>Next, we conducted 72,191 Turing-like tests with 1,916 human judges and 10 AI judges.<n>Imitation ability showed minimal correlation with conventional AI performance metrics.
arXiv Detail & Related papers (2022-11-23T16:16:52Z) - Decoding speech perception from non-invasive brain recordings [48.46819575538446]
We introduce a model trained with contrastive-learning to decode self-supervised representations of perceived speech from non-invasive recordings.
Our model can identify, from 3 seconds of MEG signals, the corresponding speech segment with up to 41% accuracy out of more than 1,000 distinct possibilities.
arXiv Detail & Related papers (2022-08-25T10:01:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.