Learning from Multiple Noisy Augmented Data Sets for Better
Cross-Lingual Spoken Language Understanding
- URL: http://arxiv.org/abs/2109.01583v1
- Date: Fri, 3 Sep 2021 15:44:15 GMT
- Title: Learning from Multiple Noisy Augmented Data Sets for Better
Cross-Lingual Spoken Language Understanding
- Authors: Yingmei Guo and Linjun Shou and Jian Pei and Ming Gong and Mingxing Xu
and Zhiyong Wu and Daxin Jiang
- Abstract summary: Lack of training data presents a grand challenge to scaling out spoken language understanding (SLU) to low-resource languages.
Various data augmentation approaches have been proposed to synthesize training data in low-resource target languages.
In this paper we focus on mitigating noise in augmented data.
- Score: 69.40915115518523
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Lack of training data presents a grand challenge to scaling out spoken
language understanding (SLU) to low-resource languages. Although various data
augmentation approaches have been proposed to synthesize training data in
low-resource target languages, the augmented data sets are often noisy, and
thus impede the performance of SLU models. In this paper we focus on mitigating
noise in augmented data. We develop a denoising training approach. Multiple
models are trained with data produced by various augmented methods. Those
models provide supervision signals to each other. The experimental results show
that our method outperforms the existing state of the art by 3.05 and 4.24
percentage points on two benchmark datasets, respectively. The code will be
made open sourced on github.
Related papers
- Video DataFlywheel: Resolving the Impossible Data Trinity in Video-Language Understanding [61.89781979702939]
This study quantitatively reveals an "impossible trinity" among data quantity, diversity, and quality in pre-training datasets.
Recent efforts seek to refine large-scale, diverse ASR datasets compromised by low quality through synthetic annotations.
We introduce the Video DataFlywheel framework, which iteratively refines video annotations with improved noise control methods.
arXiv Detail & Related papers (2024-09-29T03:33:35Z) - Less is More: Accurate Speech Recognition & Translation without Web-Scale Data [26.461185681285745]
Canary is a multilingual ASR and speech translation model.
It outperforms Whisper, OWSM, and Seamless-M4T on English, French, Spanish, and German languages.
arXiv Detail & Related papers (2024-06-28T06:22:23Z) - Robustification of Multilingual Language Models to Real-world Noise with
Robust Contrastive Pretraining [14.087882550564169]
We assess the robustness of neural models on noisy data and suggest improvements are limited to the English language.
To benchmark the performance of pretrained multilingual models, we construct noisy datasets covering five languages and four NLP tasks.
We propose Robust Contrastive Pretraining (RCP) to boost the zero-shot cross-lingual robustness of multilingual pretrained models.
arXiv Detail & Related papers (2022-10-10T15:40:43Z) - Learning Phone Recognition from Unpaired Audio and Phone Sequences Based
on Generative Adversarial Network [58.82343017711883]
This paper investigates how to learn directly from unpaired phone sequences and speech utterances.
GAN training is adopted in the first stage to find the mapping relationship between unpaired speech and phone sequence.
In the second stage, another HMM model is introduced to train from the generator's output, which boosts the performance.
arXiv Detail & Related papers (2022-07-29T09:29:28Z) - Few-Shot Non-Parametric Learning with Deep Latent Variable Model [50.746273235463754]
We propose Non-Parametric learning by Compression with Latent Variables (NPC-LV)
NPC-LV is a learning framework for any dataset with abundant unlabeled data but very few labeled ones.
We show that NPC-LV outperforms supervised methods on all three datasets on image classification in low data regime.
arXiv Detail & Related papers (2022-06-23T09:35:03Z) - Augmenting Slot Values and Contexts for Spoken Language Understanding
with Pretrained Models [45.477765875738115]
Spoken Language Understanding (SLU) is one essential step in building a dialogue system.
Due to the expensive cost of obtaining the labeled data, SLU suffers from the data scarcity problem.
We propose two strategies for finetuning process: value-based and context-based augmentation.
arXiv Detail & Related papers (2021-08-19T02:52:40Z) - Bridging the Gap Between Clean Data Training and Real-World Inference
for Spoken Language Understanding [76.89426311082927]
Existing models are trained on clean data, which causes a textitgap between clean data training and real-world inference.
We propose a method from the perspective of domain adaptation, by which both high- and low-quality samples are embedding into similar vector space.
Experiments on the widely-used dataset, Snips, and large scale in-house dataset (10 million training examples) demonstrate that this method not only outperforms the baseline models on real-world (noisy) corpus but also enhances the robustness, that is, it produces high-quality results under a noisy environment.
arXiv Detail & Related papers (2021-04-13T17:54:33Z) - Data Augmentation for Spoken Language Understanding via Pretrained
Language Models [113.56329266325902]
Training of spoken language understanding (SLU) models often faces the problem of data scarcity.
We put forward a data augmentation method using pretrained language models to boost the variability and accuracy of generated utterances.
arXiv Detail & Related papers (2020-04-29T04:07:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.