Careless Whisper: Speech-to-Text Hallucination Harms
- URL: http://arxiv.org/abs/2402.08021v2
- Date: Fri, 3 May 2024 02:18:20 GMT
- Title: Careless Whisper: Speech-to-Text Hallucination Harms
- Authors: Allison Koenecke, Anna Seo Gyeong Choi, Katelyn X. Mei, Hilke Schellmann, Mona Sloane,
- Abstract summary: We evaluate Open AI's Whisper, a state-of-the-art automated speech recognition service.
We find that roughly 1% of audio transcriptions contained entire hallucinated phrases or sentences.
We thematically analyze the Whisper-hallucinated content, finding that 38% of hallucinations include explicit harms.
- Score: 0.5242869847419834
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Speech-to-text services aim to transcribe input audio as accurately as possible. They increasingly play a role in everyday life, for example in personal voice assistants or in customer-company interactions. We evaluate Open AI's Whisper, a state-of-the-art automated speech recognition service outperforming industry competitors, as of 2023. While many of Whisper's transcriptions were highly accurate, we find that roughly 1\% of audio transcriptions contained entire hallucinated phrases or sentences which did not exist in any form in the underlying audio. We thematically analyze the Whisper-hallucinated content, finding that 38\% of hallucinations include explicit harms such as perpetuating violence, making up inaccurate associations, or implying false authority. We then study why hallucinations occur by observing the disparities in hallucination rates between speakers with aphasia (who have a lowered ability to express themselves using speech and voice) and a control group. We find that hallucinations disproportionately occur for individuals who speak with longer shares of non-vocal durations -- a common symptom of aphasia. We call on industry practitioners to ameliorate these language-model-based hallucinations in Whisper, and to raise awareness of potential biases amplified by hallucinations in downstream applications of speech-to-text models.
Related papers
- Investigation of Whisper ASR Hallucinations Induced by Non-Speech Audio [15.878350948461646]
We investigate hallucinations of the Whisper ASR model induced by non-speech audio segments present during inference.
By inducting hallucinations with various types of sounds, we show that there exists a set of hallucinations that appear frequently.
We then study hallucinations caused by the augmentation of speech with such sounds.
arXiv Detail & Related papers (2025-01-20T10:14:52Z) - Verb Mirage: Unveiling and Assessing Verb Concept Hallucinations in Multimodal Large Language Models [51.50892380172863]
We show that most state-of-the-art MLLMs suffer from severe verb hallucination.
We propose a novel rich verb knowledge-based tuning method to mitigate verb hallucination.
arXiv Detail & Related papers (2024-12-06T10:53:47Z) - Fakes of Varying Shades: How Warning Affects Human Perception and Engagement Regarding LLM Hallucinations [9.740345290187307]
This research aims to understand the human perception of hallucinations by systematically varying the degree of hallucination.
We observed that warning improved the detection of hallucination without significantly affecting the perceived truthfulness of genuine content.
arXiv Detail & Related papers (2024-04-04T18:34:32Z) - A Cause-Effect Look at Alleviating Hallucination of Knowledge-grounded Dialogue Generation [51.53917938874146]
We propose a possible solution for alleviating the hallucination in KGD by exploiting the dialogue-knowledge interaction.
Experimental results of our example implementation show that this method can reduce hallucination without disrupting other dialogue performance.
arXiv Detail & Related papers (2024-04-04T14:45:26Z) - On Large Language Models' Hallucination with Regard to Known Facts [74.96789694959894]
Large language models are successful in answering factoid questions but are also prone to hallucination.
We investigate the phenomenon of LLMs possessing correct answer knowledge yet still hallucinating from the perspective of inference dynamics.
Our study shed light on understanding the reasons for LLMs' hallucinations on their known facts, and more importantly, on accurately predicting when they are hallucinating.
arXiv Detail & Related papers (2024-03-29T06:48:30Z) - On the Audio Hallucinations in Large Audio-Video Language Models [2.303098021872002]
This paper refers to audio hallucinations and analyzes them in large audio-video language models.
We gather 1,000 sentences by inquiring about audio information and annotate them whether they contain hallucinations.
We tackle a task of audio hallucination classification using pre-trained audio-text models in the zero-shot and fine-tuning settings.
arXiv Detail & Related papers (2024-01-18T07:50:07Z) - Fine-grained Hallucination Detection and Editing for Language Models [109.56911670376932]
Large language models (LMs) are prone to generate factual errors, which are often called hallucinations.
We introduce a comprehensive taxonomy of hallucinations and argue that hallucinations manifest in diverse forms.
We propose a novel task of automatic fine-grained hallucination detection and construct a new evaluation benchmark, FavaBench.
arXiv Detail & Related papers (2024-01-12T19:02:48Z) - Hallucinations in Neural Automatic Speech Recognition: Identifying
Errors and Hallucinatory Models [11.492702369437785]
Hallucinations are semantically unrelated to the source utterance, yet still fluent and coherent.
We show that commonly used metrics, such as word error rates, cannot differentiate between hallucinatory and non-hallucinatory models.
We devise a framework for identifying hallucinations by analysing their semantic connection with the ground truth and their fluency.
arXiv Detail & Related papers (2024-01-03T06:56:56Z) - Using Mobile Data and Deep Models to Assess Auditory Verbal
Hallucinations [3.676944894021643]
A common form of auditory hallucination is hearing voices in the absence of any speakers.
We study N=435 individuals, who experience hearing voices, to assess auditory verbal hallucination.
arXiv Detail & Related papers (2023-04-20T15:37:34Z) - Probing Causes of Hallucinations in Neural Machine Translations [51.418245676894465]
We propose to use probing methods to investigate the causes of hallucinations from the perspective of model architecture.
We find that hallucination is often accompanied by the deficient encoder, especially embeddings, and vulnerable cross-attentions.
arXiv Detail & Related papers (2022-06-25T01:57:22Z) - On Hallucination and Predictive Uncertainty in Conditional Language
Generation [76.18783678114325]
Higher predictive uncertainty corresponds to a higher chance of hallucination.
Epistemic uncertainty is more indicative of hallucination than aleatoric or total uncertainties.
It helps to achieve better results of trading performance in standard metric for less hallucination with the proposed beam search variant.
arXiv Detail & Related papers (2021-03-28T00:32:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.