Cross-lingual Knowledge Transfer and Iterative Pseudo-labeling for
Low-Resource Speech Recognition with Transducers
- URL: http://arxiv.org/abs/2305.13652v1
- Date: Tue, 23 May 2023 03:50:35 GMT
- Title: Cross-lingual Knowledge Transfer and Iterative Pseudo-labeling for
Low-Resource Speech Recognition with Transducers
- Authors: Jan Silovsky, Liuhui Deng, Arturo Argueta, Tresi Arvizo, Roger Hsiao,
Sasha Kuznietsov, Yiu-Chang Lin, Xiaoqiang Xiao, Yuanyuan Zhang
- Abstract summary: Cross-lingual knowledge transfer and iterative pseudo-labeling are two techniques that have been shown to be successful for improving the accuracy of ASR systems.
We show that the Transducer system trained using transcripts produced by the hybrid system achieves 18% reduction in terms of word error rate.
- Score: 6.017182111335404
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Voice technology has become ubiquitous recently. However, the accuracy, and
hence experience, in different languages varies significantly, which makes the
technology not equally inclusive. The availability of data for different
languages is one of the key factors affecting accuracy, especially in training
of all-neural end-to-end automatic speech recognition systems.
Cross-lingual knowledge transfer and iterative pseudo-labeling are two
techniques that have been shown to be successful for improving the accuracy of
ASR systems, in particular for low-resource languages, like Ukrainian.
Our goal is to train an all-neural Transducer-based ASR system to replace a
DNN-HMM hybrid system with no manually annotated training data. We show that
the Transducer system trained using transcripts produced by the hybrid system
achieves 18% reduction in terms of word error rate. However, using a
combination of cross-lingual knowledge transfer from related languages and
iterative pseudo-labeling, we are able to achieve 35% reduction of the error
rate.
Related papers
- Investigating the Sensitivity of Automatic Speech Recognition Systems to
Phonetic Variation in L2 Englishes [3.198144010381572]
This work demonstrates a method of probing an ASR system to discover how it handles phonetic variation across a number of L2 Englishes.
It is demonstrated that the behaviour of the ASR is systematic and consistent across speakers with similar spoken varieties.
arXiv Detail & Related papers (2023-05-12T11:29:13Z) - Language-agnostic Code-Switching in Sequence-To-Sequence Speech
Recognition [62.997667081978825]
Code-Switching (CS) is referred to the phenomenon of alternately using words and phrases from different languages.
We propose a simple yet effective data augmentation in which audio and corresponding labels of different source languages are transcribed.
We show that this augmentation can even improve the model's performance on inter-sentential language switches not seen during training by 5,03% WER.
arXiv Detail & Related papers (2022-10-17T12:15:57Z) - Cross-lingual Transfer Learning for Fake News Detector in a Low-Resource
Language [0.8122270502556374]
Development of methods to detect fake news (FN) in low-resource languages has been impeded by a lack of training data.
In this study, we solve the problem by using only training data from a high-resource language.
Our FN-detection system permitted this strategy by applying adversarial learning that transfers the detection knowledge through languages.
arXiv Detail & Related papers (2022-08-26T07:41:27Z) - Parameter-Efficient Neural Reranking for Cross-Lingual and Multilingual
Retrieval [66.69799641522133]
State-of-the-art neural (re)rankers are notoriously data hungry.
Current approaches typically transfer rankers trained on English data to other languages and cross-lingual setups by means of multilingual encoders.
We show that two parameter-efficient approaches to cross-lingual transfer, namely Sparse Fine-Tuning Masks (SFTMs) and Adapters, allow for a more lightweight and more effective zero-shot transfer.
arXiv Detail & Related papers (2022-04-05T15:44:27Z) - Reducing language context confusion for end-to-end code-switching
automatic speech recognition [50.89821865949395]
We propose a language-related attention mechanism to reduce multilingual context confusion for the E2E code-switching ASR model.
By calculating the respective attention of multiple languages, our method can efficiently transfer language knowledge from rich monolingual data.
arXiv Detail & Related papers (2022-01-28T14:39:29Z) - Low Resource German ASR with Untranscribed Data Spoken by Non-native
Children -- INTERSPEECH 2021 Shared Task SPAPL System [19.435571932141364]
This paper describes the SPAPL system for the INTERSPEECH 2021 Challenge: Shared Task on Automatic Speech Recognition for Non-Native Children's Speech in German.
5 hours of transcribed data and 60 hours of untranscribed data are provided to develop a German ASR system for children.
For the training of the transcribed data, we propose a non-speech state discriminative loss (NSDL) to mitigate the influence of long-duration non-speech segments within speech utterances.
Our system achieves a word error rate (WER) of 39.68% on the evaluation data,
arXiv Detail & Related papers (2021-06-18T07:36:26Z) - Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource
End-to-End Speech Recognition [62.94773371761236]
We consider building an effective end-to-end ASR system in low-resource setups with a high OOV rate.
We propose a method of dynamic acoustic unit augmentation based on the BPE-dropout technique.
Our monolingual Turkish Conformer established a competitive result with 22.2% character error rate (CER) and 38.9% word error rate (WER)
arXiv Detail & Related papers (2021-03-12T10:10:13Z) - Improving Cross-Lingual Transfer Learning for End-to-End Speech
Recognition with Speech Translation [63.16500026845157]
We introduce speech-to-text translation as an auxiliary task to incorporate additional knowledge of the target language.
We show that training ST with human translations is not necessary.
Even with pseudo-labels from low-resource MT (200K examples), ST-enhanced transfer brings up to 8.9% WER reduction to direct transfer.
arXiv Detail & Related papers (2020-06-09T19:34:11Z) - Learning not to Discriminate: Task Agnostic Learning for Improving
Monolingual and Code-switched Speech Recognition [12.354292498112347]
We present further improvements over our previous work by using domain adversarial learning to train task models.
Our proposed technique leads to reductions in Word Error Rates (WER) in monolingual and code-switched test sets across three language pairs.
arXiv Detail & Related papers (2020-06-09T13:45:30Z) - Meta-Transfer Learning for Code-Switched Speech Recognition [72.84247387728999]
We propose a new learning method, meta-transfer learning, to transfer learn on a code-switched speech recognition system in a low-resource setting.
Our model learns to recognize individual languages, and transfer them so as to better recognize mixed-language speech by conditioning the optimization on the code-switching data.
arXiv Detail & Related papers (2020-04-29T14:27:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.