Towards Spoken Language Understanding via Multi-level Multi-grained Contrastive Learning
- URL: http://arxiv.org/abs/2405.20852v1
- Date: Fri, 31 May 2024 14:34:23 GMT
- Title: Towards Spoken Language Understanding via Multi-level Multi-grained Contrastive Learning
- Authors: Xuxin Cheng, Wanshi Xu, Zhihong Zhu, Hongxiang Li, Yuexian Zou,
- Abstract summary: Spoken language understanding (SLU) is a core task in task-oriented dialogue systems.
We propose a multi-level MMCL framework to apply contrastive learning at three levels, including utterance level, slot level, and word level.
Our framework achieves new state-of-the-art results on two public multi-intent SLU datasets.
- Score: 50.1035273069458
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Spoken language understanding (SLU) is a core task in task-oriented dialogue systems, which aims at understanding the user's current goal through constructing semantic frames. SLU usually consists of two subtasks, including intent detection and slot filling. Although there are some SLU frameworks joint modeling the two subtasks and achieving high performance, most of them still overlook the inherent relationships between intents and slots and fail to achieve mutual guidance between the two subtasks. To solve the problem, we propose a multi-level multi-grained SLU framework MMCL to apply contrastive learning at three levels, including utterance level, slot level, and word level to enable intent and slot to mutually guide each other. For the utterance level, our framework implements coarse granularity contrastive learning and fine granularity contrastive learning simultaneously. Besides, we also apply the self-distillation method to improve the robustness of the model. Experimental results and further analysis demonstrate that our proposed model achieves new state-of-the-art results on two public multi-intent SLU datasets, obtaining a 2.6 overall accuracy improvement on the MixATIS dataset compared to previous best models.
Related papers
- A Two-Stage Prediction-Aware Contrastive Learning Framework for Multi-Intent NLU [41.45522079026888]
Multi-intent natural language understanding (NLU) presents a formidable challenge due to the model confusion arising from multiple intents within a single utterance.
Previous works train the model contrastively to increase the margin between different multi-intent labels.
We introduce a two-stage Prediction-Aware Contrastive Learning framework for multi-intent NLU.
arXiv Detail & Related papers (2024-05-05T13:09:55Z) - Co-guiding for Multi-intent Spoken Language Understanding [53.30511968323911]
We propose a novel model termed Co-guiding Net, which implements a two-stage framework achieving the mutual guidances between the two tasks.
For the first stage, we propose single-task supervised contrastive learning, and for the second stage, we propose co-guiding supervised contrastive learning.
Experiment results on multi-intent SLU show that our model outperforms existing models by a large margin.
arXiv Detail & Related papers (2023-11-22T08:06:22Z) - I$^2$KD-SLU: An Intra-Inter Knowledge Distillation Framework for
Zero-Shot Cross-Lingual Spoken Language Understanding [1.223779595809275]
We propose an Intra-Inter Knowledge Distillation framework for zero-shot cross-lingual Spoken Language Understanding (I$2$KD-SLU)
Specifically, we not only apply intra-knowledge distillation between intent predictions or slot predictions of the same utterance in different languages, but also apply inter-knowledge distillation between intent predictions and slot predictions of the same utterance.
Our experimental results demonstrate that our proposed framework significantly improves the performance compared with the strong baselines.
arXiv Detail & Related papers (2023-10-04T05:45:23Z) - Semantics-Depth-Symbiosis: Deeply Coupled Semi-Supervised Learning of
Semantics and Depth [83.94528876742096]
We tackle the MTL problem of two dense tasks, ie, semantic segmentation and depth estimation, and present a novel attention module called Cross-Channel Attention Module (CCAM)
In a true symbiotic spirit, we then formulate a novel data augmentation for the semantic segmentation task using predicted depth called AffineMix, and a simple depth augmentation using predicted semantics called ColorAug.
Finally, we validate the performance gain of the proposed method on the Cityscapes dataset, which helps us achieve state-of-the-art results for a semi-supervised joint model based on depth and semantic
arXiv Detail & Related papers (2022-06-21T17:40:55Z) - Multi-level Contrastive Learning for Cross-lingual Spoken Language
Understanding [90.87454350016121]
We develop novel code-switching schemes to generate hard negative examples for contrastive learning at all levels.
We develop a label-aware joint model to leverage label semantics for cross-lingual knowledge transfer.
arXiv Detail & Related papers (2022-05-07T13:44:28Z) - Contrastive Instruction-Trajectory Learning for Vision-Language
Navigation [66.16980504844233]
A vision-language navigation (VLN) task requires an agent to reach a target with the guidance of natural language instruction.
Previous works fail to discriminate the similarities and discrepancies across instruction-trajectory pairs and ignore the temporal continuity of sub-instructions.
We propose a Contrastive Instruction-Trajectory Learning framework that explores invariance across similar data samples and variance across different ones to learn distinctive representations for robust navigation.
arXiv Detail & Related papers (2021-12-08T06:32:52Z) - Meta learning to classify intent and slot labels with noisy few shot
examples [11.835266162072486]
Spoken language understanding (SLU) models are notorious for being data-hungry.
We propose a new SLU benchmarking task: few-shot robust SLU, where SLU comprises two core problems, intent classification (IC) and slot labeling (SL)
We show the model consistently outperforms the conventional fine-tuning baseline and another popular meta-learning method, Model-Agnostic Meta-Learning (MAML), in terms of achieving better IC accuracy and SL F1.
arXiv Detail & Related papers (2020-11-30T18:53:30Z) - PIN: A Novel Parallel Interactive Network for Spoken Language
Understanding [68.53121591998483]
In the existing RNN-based approaches, ID and SF tasks are often jointly modeled to utilize the correlation information between them.
The experiments on two benchmark datasets, i.e., SNIPS and ATIS, demonstrate the effectiveness of our approach.
More encouragingly, by using the feature embedding of the utterance generated by the pre-trained language model BERT, our method achieves the state-of-the-art among all comparison approaches.
arXiv Detail & Related papers (2020-09-28T15:59:31Z) - Dual Learning for Semi-Supervised Natural Language Understanding [29.692288627633374]
Natural language understanding (NLU) converts sentences into structured semantic forms.
We introduce a dual task of NLU, semantic-to-sentence generation (SSG)
We propose a new framework for semi-supervised NLU with the corresponding dual model.
arXiv Detail & Related papers (2020-04-26T07:17:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.