DisfluencyFixer: A tool to enhance Language Learning through Speech To
Speech Disfluency Correction
- URL: http://arxiv.org/abs/2305.16957v1
- Date: Fri, 26 May 2023 14:13:38 GMT
- Title: DisfluencyFixer: A tool to enhance Language Learning through Speech To
Speech Disfluency Correction
- Authors: Vineet Bhat, Preethi Jyothi and Pushpak Bhattacharyya
- Abstract summary: DisfluencyFixer is a tool that performs speech-to-speech disfluency correction in English and Hindi.
Our proposed system removes disfluencies from input speech and returns fluent speech as output.
- Score: 50.51901599433536
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Conversational speech often consists of deviations from the speech plan,
producing disfluent utterances that affect downstream NLP tasks. Removing these
disfluencies is necessary to create fluent and coherent speech. This paper
presents DisfluencyFixer, a tool that performs speech-to-speech disfluency
correction in English and Hindi using a pipeline of Automatic Speech
Recognition (ASR), Disfluency Correction (DC) and Text-To-Speech (TTS) models.
Our proposed system removes disfluencies from input speech and returns fluent
speech as output along with its transcript, disfluency type and total
disfluency count in source utterance, providing a one-stop destination for
language learners to improve the fluency of their speech. We evaluate the
performance of our tool subjectively and receive scores of 4.26, 4.29 and 4.42
out of 5 in ASR performance, DC performance and ease-of-use of the system. Our
tool can be accessed openly at the following link.
Related papers
- Analyzing Speech Unit Selection for Textless Speech-to-Speech Translation [23.757896930482342]
This work explores the selection process through a study of downstream tasks.
Units that perform well in resynthesis performance do not necessarily correlate with those that enhance translation efficacy.
arXiv Detail & Related papers (2024-07-08T08:53:26Z) - Adversarial Training For Low-Resource Disfluency Correction [50.51901599433536]
We propose an adversarially-trained sequence-tagging model for Disfluency Correction (DC)
We show the benefit of our proposed technique, which crucially depends on synthetically generated disfluent data, by evaluating it for DC in three Indian languages.
Our technique also performs well in removing stuttering disfluencies in ASR transcripts introduced by speech impairments.
arXiv Detail & Related papers (2023-06-10T08:58:53Z) - Weakly-supervised forced alignment of disfluent speech using
phoneme-level modeling [10.283092375534311]
We propose a simple and effective modification of alignment graph construction using weighted Finite State Transducers.
The proposed weakly-supervised approach alleviates the need for verbatim transcription of speech disfluencies for forced alignment.
Our evaluation on a corrupted version of the TIMIT test set and the UCLASS dataset shows significant improvements.
arXiv Detail & Related papers (2023-05-30T09:57:36Z) - Code-Switching without Switching: Language Agnostic End-to-End Speech
Translation [68.8204255655161]
We treat speech recognition and translation as one unified end-to-end speech translation problem.
By training LAST with both input languages, we decode speech into one target language, regardless of the input language.
arXiv Detail & Related papers (2022-10-04T10:34:25Z) - VQMIVC: Vector Quantization and Mutual Information-Based Unsupervised
Speech Representation Disentanglement for One-shot Voice Conversion [54.29557210925752]
One-shot voice conversion can be effectively achieved by speech representation disentanglement.
We employ vector quantization (VQ) for content encoding and introduce mutual information (MI) as the correlation metric during training.
Experimental results reflect the superiority of the proposed method in learning effective disentangled speech representations.
arXiv Detail & Related papers (2021-06-18T13:50:38Z) - UniSpeech: Unified Speech Representation Learning with Labeled and
Unlabeled Data [54.733889961024445]
We propose a unified pre-training approach called UniSpeech to learn speech representations with both unlabeled and labeled data.
We evaluate the effectiveness of UniSpeech for cross-lingual representation learning on public CommonVoice corpus.
arXiv Detail & Related papers (2021-01-19T12:53:43Z) - Fluent and Low-latency Simultaneous Speech-to-Speech Translation with
Self-adaptive Training [40.71155396456831]
Simultaneous speech-to-speech translation is widely useful but extremely challenging.
It needs to generate target-language speech concurrently with the source-language speech, with only a few seconds delay.
Current approaches accumulate latencies progressively when the speaker talks faster, and introduce unnatural pauses when the speaker talks slower.
We propose Self-Adaptive Translation (SAT) which flexibly adjusts the length of translations to accommodate different source speech rates.
arXiv Detail & Related papers (2020-10-20T06:02:15Z) - End-to-End Speech Recognition and Disfluency Removal [15.910282983166024]
This paper investigates the task of end-to-end speech recognition and disfluency removal.
We show that end-to-end models do learn to directly generate fluent transcripts.
We propose two new metrics that can be used for evaluating integrated ASR and disfluency models.
arXiv Detail & Related papers (2020-09-22T03:11:37Z) - UWSpeech: Speech to Speech Translation for Unwritten Languages [145.37116196042282]
We develop a translation system for unwritten languages, named as UWSpeech, which converts target unwritten speech into discrete tokens with a converter.
We propose a method called XL-VAE, which enhances vector quantized variational autoencoder (VQ-VAE) with cross-lingual (XL) speech recognition.
Experiments on Fisher Spanish-English conversation translation dataset show that UWSpeech outperforms direct translation and VQ-VAE baseline by about 16 and 10 BLEU points respectively.
arXiv Detail & Related papers (2020-06-14T15:22:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.