Exploring the Potential of Lexical Paraphrases for Mitigating
Noise-Induced Comprehension Errors
- URL: http://arxiv.org/abs/2107.08337v1
- Date: Sun, 18 Jul 2021 01:16:33 GMT
- Title: Exploring the Potential of Lexical Paraphrases for Mitigating
Noise-Induced Comprehension Errors
- Authors: Anupama Chingacham, Vera Demberg, Dietrich Klakow
- Abstract summary: Speech can be masked by noise, which may lead to word misperceptions on the side of the listener.
We propose an alternate solution of choosing noise-robust lexical paraphrases to represent an intended meaning.
We evaluate the intelligibility of synonyms in context and find that choosing a lexical unit that is less risky to be misheard than its synonym introduced an average gain in comprehension of 37% at SNR -5 dB and 21% at SNR 0 dB for babble noise.
- Score: 17.486619771816123
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Listening in noisy environments can be difficult even for individuals with a
normal hearing thresholds. The speech signal can be masked by noise, which may
lead to word misperceptions on the side of the listener, and overall difficulty
to understand the message. To mitigate hearing difficulties on listeners, a
co-operative speaker utilizes voice modulation strategies like Lombard speech
to generate noise-robust utterances, and similar solutions have been developed
for speech synthesis systems. In this work, we propose an alternate solution of
choosing noise-robust lexical paraphrases to represent an intended meaning. Our
results show that lexical paraphrases differ in their intelligibility in noise.
We evaluate the intelligibility of synonyms in context and find that choosing a
lexical unit that is less risky to be misheard than its synonym introduced an
average gain in comprehension of 37% at SNR -5 dB and 21% at SNR 0 dB for
babble noise.
Related papers
- Human Speech Perception in Noise: Can Large Language Models Paraphrase to Improve It? [26.835947209927273]
Large Language Models (LLMs) can generate text by transferring style attributes like formality resulting in formal or informal text.
We conduct the first study to evaluate LLMs on a novel task of generating acoustically intelligible paraphrases for better human speech perception in noise.
Our approach resulted in a 40% relative improvement in human speech perception, by paraphrasing utterances that are highly distorted in a listening condition with babble noise at a signal-to-noise ratio (SNR) -5 dB.
arXiv Detail & Related papers (2024-08-07T18:24:23Z) - Large Language Models are Efficient Learners of Noise-Robust Speech
Recognition [65.95847272465124]
Recent advances in large language models (LLMs) have promoted generative error correction (GER) for automatic speech recognition (ASR)
In this work, we extend the benchmark to noisy conditions and investigate if we can teach LLMs to perform denoising for GER.
Experiments on various latest LLMs demonstrate our approach achieves a new breakthrough with up to 53.9% correction improvement in terms of word error rate.
arXiv Detail & Related papers (2024-01-19T01:29:27Z) - Sources of Noise in Dialogue and How to Deal with Them [63.02707014103651]
Training dialogue systems often entails dealing with noisy training examples and unexpected user inputs.
Despite their prevalence, there currently lacks an accurate survey of dialogue noise.
This paper addresses this gap by first constructing a taxonomy of noise encountered by dialogue systems.
arXiv Detail & Related papers (2022-12-06T04:36:32Z) - A Data-Driven Investigation of Noise-Adaptive Utterance Generation with
Linguistic Modification [25.082714256583422]
In noisy environments, speech can be hard to understand for humans.
We create a dataset of 900 paraphrases in babble noise, perceived by native English speakers with normal hearing.
We find that careful selection of paraphrases can improve intelligibility by 33% at SNR -5 dB.
arXiv Detail & Related papers (2022-10-19T02:20:17Z) - Optimization of a Real-Time Wavelet-Based Algorithm for Improving Speech
Intelligibility [1.0554048699217666]
The discrete-time speech signal is split into frequency sub-bands via a multi-level discrete wavelet transform.
The sub-band gains are adjusted while keeping the overall signal energy unchanged.
The speech intelligibility under various background interference and simulated hearing loss conditions is enhanced.
arXiv Detail & Related papers (2022-02-05T13:03:57Z) - Improving Noise Robustness of Contrastive Speech Representation Learning
with Speech Reconstruction [109.44933866397123]
Noise robustness is essential for deploying automatic speech recognition systems in real-world environments.
We employ a noise-robust representation learned by a refined self-supervised framework for noisy speech recognition.
We achieve comparable performance to the best supervised approach reported with only 16% of labeled data.
arXiv Detail & Related papers (2021-10-28T20:39:02Z) - Wav2vec-Switch: Contrastive Learning from Original-noisy Speech Pairs
for Robust Speech Recognition [52.71604809100364]
We propose wav2vec-Switch, a method to encode noise robustness into contextualized representations of speech.
Specifically, we feed original-noisy speech pairs simultaneously into the wav2vec 2.0 network.
In addition to the existing contrastive learning task, we switch the quantized representations of the original and noisy speech as additional prediction targets.
arXiv Detail & Related papers (2021-10-11T00:08:48Z) - PL-EESR: Perceptual Loss Based END-TO-END Robust Speaker Representation
Extraction [90.55375210094995]
Speech enhancement aims to improve the perceptual quality of the speech signal by suppression of the background noise.
We propose an end-to-end deep learning framework, dubbed PL-EESR, for robust speaker representation extraction.
arXiv Detail & Related papers (2021-10-03T07:05:29Z) - Adversarial Feature Learning and Unsupervised Clustering based Speech
Synthesis for Found Data with Acoustic and Textual Noise [18.135965605011105]
Attention-based sequence-to-sequence (seq2seq) speech synthesis has achieved extraordinary performance.
A studio-quality corpus with manual transcription is necessary to train such seq2seq systems.
We propose an approach to build high-quality and stable seq2seq based speech synthesis system using challenging found data.
arXiv Detail & Related papers (2020-04-28T15:32:45Z) - Speaker Diarization with Lexical Information [59.983797884955]
This work presents a novel approach for speaker diarization to leverage lexical information provided by automatic speech recognition.
We propose a speaker diarization system that can incorporate word-level speaker turn probabilities with speaker embeddings into a speaker clustering process to improve the overall diarization accuracy.
arXiv Detail & Related papers (2020-04-13T17:16:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.