Investigation of Whisper ASR Hallucinations Induced by Non-Speech Audio
- URL: http://arxiv.org/abs/2501.11378v1
- Date: Mon, 20 Jan 2025 10:14:52 GMT
- Title: Investigation of Whisper ASR Hallucinations Induced by Non-Speech Audio
- Authors: Mateusz Barański, Jan Jasiński, Julitta Bartolewska, Stanisław Kacprzak, Marcin Witkowski, Konrad Kowalczyk,
- Abstract summary: We investigate hallucinations of the Whisper ASR model induced by non-speech audio segments present during inference.
By inducting hallucinations with various types of sounds, we show that there exists a set of hallucinations that appear frequently.
We then study hallucinations caused by the augmentation of speech with such sounds.
- Score: 15.878350948461646
- License:
- Abstract: Hallucinations of deep neural models are amongst key challenges in automatic speech recognition (ASR). In this paper, we investigate hallucinations of the Whisper ASR model induced by non-speech audio segments present during inference. By inducting hallucinations with various types of sounds, we show that there exists a set of hallucinations that appear frequently. We then study hallucinations caused by the augmentation of speech with such sounds. Finally, we describe the creation of a bag of hallucinations (BoH) that allows to remove the effect of hallucinations through the post-processing of text transcriptions. The results of our experiments show that such post-processing is capable of reducing word error rate (WER) and acts as a good safeguard against problematic hallucinations.
Related papers
- Verb Mirage: Unveiling and Assessing Verb Concept Hallucinations in Multimodal Large Language Models [51.50892380172863]
We show that most state-of-the-art MLLMs suffer from severe verb hallucination.
We propose a novel rich verb knowledge-based tuning method to mitigate verb hallucination.
arXiv Detail & Related papers (2024-12-06T10:53:47Z) - A Cause-Effect Look at Alleviating Hallucination of Knowledge-grounded Dialogue Generation [51.53917938874146]
We propose a possible solution for alleviating the hallucination in KGD by exploiting the dialogue-knowledge interaction.
Experimental results of our example implementation show that this method can reduce hallucination without disrupting other dialogue performance.
arXiv Detail & Related papers (2024-04-04T14:45:26Z) - Careless Whisper: Speech-to-Text Hallucination Harms [0.5242869847419834]
We evaluate Open AI's Whisper, a state-of-the-art automated speech recognition service.
We find that roughly 1% of audio transcriptions contained entire hallucinated phrases or sentences.
We thematically analyze the Whisper-hallucinated content, finding that 38% of hallucinations include explicit harms.
arXiv Detail & Related papers (2024-02-12T19:35:37Z) - Fine-grained Hallucination Detection and Editing for Language Models [109.56911670376932]
Large language models (LMs) are prone to generate factual errors, which are often called hallucinations.
We introduce a comprehensive taxonomy of hallucinations and argue that hallucinations manifest in diverse forms.
We propose a novel task of automatic fine-grained hallucination detection and construct a new evaluation benchmark, FavaBench.
arXiv Detail & Related papers (2024-01-12T19:02:48Z) - Hallucinations in Neural Automatic Speech Recognition: Identifying
Errors and Hallucinatory Models [11.492702369437785]
Hallucinations are semantically unrelated to the source utterance, yet still fluent and coherent.
We show that commonly used metrics, such as word error rates, cannot differentiate between hallucinatory and non-hallucinatory models.
We devise a framework for identifying hallucinations by analysing their semantic connection with the ground truth and their fluency.
arXiv Detail & Related papers (2024-01-03T06:56:56Z) - Alleviating Hallucinations of Large Language Models through Induced
Hallucinations [67.35512483340837]
Large language models (LLMs) have been observed to generate responses that include inaccurate or fabricated information.
We propose a simple textitInduce-then-Contrast Decoding (ICD) strategy to alleviate hallucinations.
arXiv Detail & Related papers (2023-12-25T12:32:49Z) - On Early Detection of Hallucinations in Factual Question Answering [4.76359068115052]
hallucinations remain a major impediment towards gaining user trust.
In this work, we explore if the artifacts associated with the model generations can provide hints that the generation will contain hallucinations.
Our results show that the distributions of these artifacts tend to differ between hallucinated and non-hallucinated generations.
arXiv Detail & Related papers (2023-12-19T14:35:04Z) - HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large
Language Models [146.87696738011712]
Large language models (LLMs) are prone to generate hallucinations, i.e., content that conflicts with the source or cannot be verified by the factual knowledge.
To understand what types of content and to which extent LLMs are apt to hallucinate, we introduce the Hallucination Evaluation benchmark for Large Language Models (HaluEval)
arXiv Detail & Related papers (2023-05-19T15:36:27Z) - Using Mobile Data and Deep Models to Assess Auditory Verbal
Hallucinations [3.676944894021643]
A common form of auditory hallucination is hearing voices in the absence of any speakers.
We study N=435 individuals, who experience hearing voices, to assess auditory verbal hallucination.
arXiv Detail & Related papers (2023-04-20T15:37:34Z) - Probing Causes of Hallucinations in Neural Machine Translations [51.418245676894465]
We propose to use probing methods to investigate the causes of hallucinations from the perspective of model architecture.
We find that hallucination is often accompanied by the deficient encoder, especially embeddings, and vulnerable cross-attentions.
arXiv Detail & Related papers (2022-06-25T01:57:22Z) - The Curious Case of Hallucinations in Neural Machine Translation [5.3180458405676205]
hallucinations in Neural Machine Translation lie at an extreme end on the spectrum of NMT pathologies.
We consider hallucinations under corpus-level noise (without any source perturbation) and demonstrate that two prominent types of natural hallucinations could be generated and explained through specific corpus-level noise patterns.
We elucidate the phenomenon of hallucination amplification in popular data-generation processes such as Backtranslation and sequence-level Knowledge Distillation.
arXiv Detail & Related papers (2021-04-14T08:09:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.