Adversarial Training For Low-Resource Disfluency Correction
- URL: http://arxiv.org/abs/2306.06384v1
- Date: Sat, 10 Jun 2023 08:58:53 GMT
- Title: Adversarial Training For Low-Resource Disfluency Correction
- Authors: Vineet Bhat, Preethi Jyothi and Pushpak Bhattacharyya
- Abstract summary: We propose an adversarially-trained sequence-tagging model for Disfluency Correction (DC)
We show the benefit of our proposed technique, which crucially depends on synthetically generated disfluent data, by evaluating it for DC in three Indian languages.
Our technique also performs well in removing stuttering disfluencies in ASR transcripts introduced by speech impairments.
- Score: 50.51901599433536
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Disfluencies commonly occur in conversational speech. Speech with
disfluencies can result in noisy Automatic Speech Recognition (ASR)
transcripts, which affects downstream tasks like machine translation. In this
paper, we propose an adversarially-trained sequence-tagging model for
Disfluency Correction (DC) that utilizes a small amount of labeled real
disfluent data in conjunction with a large amount of unlabeled data. We show
the benefit of our proposed technique, which crucially depends on synthetically
generated disfluent data, by evaluating it for DC in three Indian languages-
Bengali, Hindi, and Marathi (all from the Indo-Aryan family). Our technique
also performs well in removing stuttering disfluencies in ASR transcripts
introduced by speech impairments. We achieve an average 6.15 points improvement
in F1-score over competitive baselines across all three languages mentioned. To
the best of our knowledge, we are the first to utilize adversarial training for
DC and use it to correct stuttering disfluencies in English, establishing a new
benchmark for this task.
Related papers
- DISCO: A Large Scale Human Annotated Corpus for Disfluency Correction in
Indo-European Languages [68.66827612799577]
Disfluency correction (DC) is the process of removing disfluent elements like fillers, repetitions and corrections from spoken utterances to create readable and interpretable text.
We present a high-quality human-annotated DC corpus covering four important Indo-European languages: English, Hindi, German and French.
We show that DC leads to 5.65 points increase in BLEU scores on average when used in conjunction with a state-of-the-art Machine Translation (MT) system.
arXiv Detail & Related papers (2023-10-25T16:32:02Z) - Unlikelihood Tuning on Negative Samples Amazingly Improves Zero-Shot
Translation [79.96416609433724]
Zero-shot translation (ZST) aims to translate between unseen language pairs in training data.
The common practice to guide the zero-shot language mapping during inference is to deliberately insert the source and target language IDs.
Recent studies have shown that language IDs sometimes fail to navigate the ZST task, making them suffer from the off-target problem.
arXiv Detail & Related papers (2023-09-28T17:02:36Z) - Weakly-supervised forced alignment of disfluent speech using
phoneme-level modeling [10.283092375534311]
We propose a simple and effective modification of alignment graph construction using weighted Finite State Transducers.
The proposed weakly-supervised approach alleviates the need for verbatim transcription of speech disfluencies for forced alignment.
Our evaluation on a corrupted version of the TIMIT test set and the UCLASS dataset shows significant improvements.
arXiv Detail & Related papers (2023-05-30T09:57:36Z) - DisfluencyFixer: A tool to enhance Language Learning through Speech To
Speech Disfluency Correction [50.51901599433536]
DisfluencyFixer is a tool that performs speech-to-speech disfluency correction in English and Hindi.
Our proposed system removes disfluencies from input speech and returns fluent speech as output.
arXiv Detail & Related papers (2023-05-26T14:13:38Z) - Zero-Shot Cross-lingual Aphasia Detection using Automatic Speech
Recognition [3.2631198264090746]
Aphasia is a common speech and language disorder, typically caused by a brain injury or a stroke, that affects millions of people worldwide.
We propose an end-to-end pipeline using pre-trained Automatic Speech Recognition (ASR) models that share cross-lingual speech representations.
arXiv Detail & Related papers (2022-04-01T14:05:02Z) - Wav2vec-Switch: Contrastive Learning from Original-noisy Speech Pairs
for Robust Speech Recognition [52.71604809100364]
We propose wav2vec-Switch, a method to encode noise robustness into contextualized representations of speech.
Specifically, we feed original-noisy speech pairs simultaneously into the wav2vec 2.0 network.
In addition to the existing contrastive learning task, we switch the quantized representations of the original and noisy speech as additional prediction targets.
arXiv Detail & Related papers (2021-10-11T00:08:48Z) - On the Language Coverage Bias for Neural Machine Translation [81.81456880770762]
Language coverage bias is important for neural machine translation (NMT) because the target-original training data is not well exploited in current practice.
By carefully designing experiments, we provide comprehensive analyses of the language coverage bias in the training data.
We propose two simple and effective approaches to alleviate the language coverage bias problem.
arXiv Detail & Related papers (2021-06-07T01:55:34Z) - Improving low-resource ASR performance with untranscribed out-of-domain
data [8.376091455761259]
Semi-supervised training (SST) is a common approach to leverage untranscribed/unlabeled speech data.
We look to improve performance on conversational/telephony speech (target domain) using web resources.
arXiv Detail & Related papers (2021-06-02T15:23:34Z) - Improved Robustness to Disfluencies in RNN-Transducer Based Speech
Recognition [1.8702587873591643]
We investigate data selection and preparation choices aiming for improved robustness of RNN-T ASR to speech disfluencies.
We show that after including a small amount of data with disfluencies in the training set the recognition accuracy on the tests with disfluencies and stuttering improves.
arXiv Detail & Related papers (2020-12-11T11:47:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.