ML-LMCL: Mutual Learning and Large-Margin Contrastive Learning for
Improving ASR Robustness in Spoken Language Understanding
- URL: http://arxiv.org/abs/2311.11375v1
- Date: Sun, 19 Nov 2023 16:53:35 GMT
- Title: ML-LMCL: Mutual Learning and Large-Margin Contrastive Learning for
Improving ASR Robustness in Spoken Language Understanding
- Authors: Xuxin Cheng, Bowen Cao, Qichen Ye, Zhihong Zhu, Hongxiang Li, Yuexian
Zou
- Abstract summary: We propose Mutual Learning and Large-Margin Contrastive Learning (ML-LMCL) to improve automatic speech recognition (ASR) robustness.
In fine-tuning, we apply mutual learning and train two SLU models on the manual transcripts and the ASR transcripts, respectively.
Experiments on three datasets show that ML-LMCL outperforms existing models and achieves new state-of-the-art performance.
- Score: 55.39105863825107
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Spoken language understanding (SLU) is a fundamental task in the
task-oriented dialogue systems. However, the inevitable errors from automatic
speech recognition (ASR) usually impair the understanding performance and lead
to error propagation. Although there are some attempts to address this problem
through contrastive learning, they (1) treat clean manual transcripts and ASR
transcripts equally without discrimination in fine-tuning; (2) neglect the fact
that the semantically similar pairs are still pushed away when applying
contrastive learning; (3) suffer from the problem of Kullback-Leibler (KL)
vanishing. In this paper, we propose Mutual Learning and Large-Margin
Contrastive Learning (ML-LMCL), a novel framework for improving ASR robustness
in SLU. Specifically, in fine-tuning, we apply mutual learning and train two
SLU models on the manual transcripts and the ASR transcripts, respectively,
aiming to iteratively share knowledge between these two models. We also
introduce a distance polarization regularizer to avoid pushing away the
intra-cluster pairs as much as possible. Moreover, we use a cyclical annealing
schedule to mitigate KL vanishing issue. Experiments on three datasets show
that ML-LMCL outperforms existing models and achieves new state-of-the-art
performance.
Related papers
- Towards Spoken Language Understanding via Multi-level Multi-grained Contrastive Learning [50.1035273069458]
Spoken language understanding (SLU) is a core task in task-oriented dialogue systems.
We propose a multi-level MMCL framework to apply contrastive learning at three levels, including utterance level, slot level, and word level.
Our framework achieves new state-of-the-art results on two public multi-intent SLU datasets.
arXiv Detail & Related papers (2024-05-31T14:34:23Z) - Contrastive and Consistency Learning for Neural Noisy-Channel Model in Spoken Language Understanding [1.07288078404291]
We propose a natural language understanding approach based on Automatic Speech Recognition (ASR)
We improve a noisy-channel model to handle transcription inconsistencies caused by ASR errors.
Experiments on four benchmark datasets show that Contrastive and Consistency Learning (CCL) outperforms existing methods.
arXiv Detail & Related papers (2024-05-23T23:10:23Z) - Towards ASR Robust Spoken Language Understanding Through In-Context
Learning With Word Confusion Networks [68.79880423713597]
We introduce a method that utilizes the ASR system's lattice output instead of relying solely on the top hypothesis.
Our in-context learning experiments, covering spoken question answering and intent classification, underline the LLM's resilience to noisy speech transcripts.
arXiv Detail & Related papers (2024-01-05T17:58:10Z) - Contrastive Learning for Improving ASR Robustness in Spoken Language
Understanding [28.441725610692714]
This paper focuses on learning utterance representations that are robust to ASR errors using a contrastive objective.
Experiments on three benchmark datasets demonstrate the effectiveness of our proposed approach.
arXiv Detail & Related papers (2022-05-02T07:21:21Z) - Sequence-level self-learning with multiple hypotheses [53.04725240411895]
We develop new self-learning techniques with an attention-based sequence-to-sequence (seq2seq) model for automatic speech recognition (ASR)
In contrast to conventional unsupervised learning approaches, we adopt the emphmulti-task learning (MTL) framework.
Our experiment results show that our method can reduce the WER on the British speech data from 14.55% to 10.36% compared to the baseline model trained with the US English data only.
arXiv Detail & Related papers (2021-12-10T20:47:58Z) - Contrastive Instruction-Trajectory Learning for Vision-Language
Navigation [66.16980504844233]
A vision-language navigation (VLN) task requires an agent to reach a target with the guidance of natural language instruction.
Previous works fail to discriminate the similarities and discrepancies across instruction-trajectory pairs and ignore the temporal continuity of sub-instructions.
We propose a Contrastive Instruction-Trajectory Learning framework that explores invariance across similar data samples and variance across different ones to learn distinctive representations for robust navigation.
arXiv Detail & Related papers (2021-12-08T06:32:52Z) - Do as I mean, not as I say: Sequence Loss Training for Spoken Language
Understanding [22.652754839140744]
Spoken language understanding (SLU) systems extract transcriptions, as well as semantics of intent or named entities from speech.
We propose non-differentiable sequence losses based on SLU metrics as a proxy for semantic error and use the REINFORCE trick to train ASR and SLU models with this loss.
We show that custom sequence loss training is the state-of-the-art on open SLU datasets and leads to 6% relative improvement in both ASR and NLU performance metrics.
arXiv Detail & Related papers (2021-02-12T20:09:08Z) - Robust Spoken Language Understanding with RL-based Value Error Recovery [35.82890898452309]
Spoken Language Understanding (SLU) aims to extract structured semantic representations (e.g., slot-value pairs) from speech recognized texts.
We propose a new robust SLU framework to guide the SLU input adaptation with a rule-based value error recovery module.
Experiments on the public CATSLU dataset show the effectiveness of our proposed approach.
arXiv Detail & Related papers (2020-09-07T13:32:07Z) - Joint Contextual Modeling for ASR Correction and Language Understanding [60.230013453699975]
We propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with language understanding (LU)
We show that the error rates of off the shelf ASR and following LU systems can be reduced significantly by 14% relative with joint models trained using small amounts of in-domain data.
arXiv Detail & Related papers (2020-01-28T22:09:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.