Co-guiding for Multi-intent Spoken Language Understanding
- URL: http://arxiv.org/abs/2312.03716v1
- Date: Wed, 22 Nov 2023 08:06:22 GMT
- Title: Co-guiding for Multi-intent Spoken Language Understanding
- Authors: Bowen Xing and Ivor W. Tsang
- Abstract summary: We propose a novel model termed Co-guiding Net, which implements a two-stage framework achieving the mutual guidances between the two tasks.
For the first stage, we propose single-task supervised contrastive learning, and for the second stage, we propose co-guiding supervised contrastive learning.
Experiment results on multi-intent SLU show that our model outperforms existing models by a large margin.
- Score: 53.30511968323911
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent graph-based models for multi-intent SLU have obtained promising
results through modeling the guidance from the prediction of intents to the
decoding of slot filling. However, existing methods (1) only model the
unidirectional guidance from intent to slot, while there are bidirectional
inter-correlations between intent and slot; (2) adopt homogeneous graphs to
model the interactions between the slot semantics nodes and intent label nodes,
which limit the performance. In this paper, we propose a novel model termed
Co-guiding Net, which implements a two-stage framework achieving the mutual
guidances between the two tasks. In the first stage, the initial estimated
labels of both tasks are produced, and then they are leveraged in the second
stage to model the mutual guidances. Specifically, we propose two heterogeneous
graph attention networks working on the proposed two heterogeneous semantics
label graphs, which effectively represent the relations among the semantics
nodes and label nodes. Besides, we further propose Co-guiding-SCL Net, which
exploits the single-task and dual-task semantics contrastive relations. For the
first stage, we propose single-task supervised contrastive learning, and for
the second stage, we propose co-guiding supervised contrastive learning, which
considers the two tasks' mutual guidances in the contrastive learning
procedure. Experiment results on multi-intent SLU show that our model
outperforms existing models by a large margin, obtaining a relative improvement
of 21.3% over the previous best model on MixATIS dataset in overall accuracy.
We also evaluate our model on the zero-shot cross-lingual scenario and the
results show that our model can relatively improve the state-of-the-art model
by 33.5% on average in terms of overall accuracy for the total 9 languages.
Related papers
- Towards Spoken Language Understanding via Multi-level Multi-grained Contrastive Learning [50.1035273069458]
Spoken language understanding (SLU) is a core task in task-oriented dialogue systems.
We propose a multi-level MMCL framework to apply contrastive learning at three levels, including utterance level, slot level, and word level.
Our framework achieves new state-of-the-art results on two public multi-intent SLU datasets.
arXiv Detail & Related papers (2024-05-31T14:34:23Z) - S^2Former-OR: Single-Stage Bi-Modal Transformer for Scene Graph Generation in OR [50.435592120607815]
Scene graph generation (SGG) of surgical procedures is crucial in enhancing holistically cognitive intelligence in the operating room (OR)
Previous works have primarily relied on multi-stage learning, where the generated semantic scene graphs depend on intermediate processes with pose estimation and object detection.
In this study, we introduce a novel single-stage bi-modal transformer framework for SGG in the OR, termed S2Former-OR.
arXiv Detail & Related papers (2024-02-22T11:40:49Z) - Robust Training of Federated Models with Extremely Label Deficiency [84.00832527512148]
Federated semi-supervised learning (FSSL) has emerged as a powerful paradigm for collaboratively training machine learning models using distributed data with label deficiency.
We propose a novel twin-model paradigm, called Twin-sight, designed to enhance mutual guidance by providing insights from different perspectives of labeled and unlabeled data.
Our comprehensive experiments on four benchmark datasets provide substantial evidence that Twin-sight can significantly outperform state-of-the-art methods across various experimental settings.
arXiv Detail & Related papers (2024-02-22T10:19:34Z) - Joint Multiple Intent Detection and Slot Filling with Supervised
Contrastive Learning and Self-Distillation [4.123763595394021]
Multiple intent detection and slot filling are fundamental and crucial tasks in spoken language understanding.
Joint models that can detect intents and extract slots simultaneously are preferred.
We present a method for multiple intent detection and slot filling by addressing these challenges.
arXiv Detail & Related papers (2023-08-28T15:36:33Z) - A Dynamic Graph Interactive Framework with Label-Semantic Injection for
Spoken Language Understanding [43.48113981442722]
We propose a framework termed DGIF, which first leverages the semantic information of labels to give the model additional signals and enriched priors.
We propose a novel approach to construct the interactive graph based on the injection of label semantics, which can automatically update the graph to better alleviate error propagation.
arXiv Detail & Related papers (2022-11-08T05:57:46Z) - Co-guiding Net: Achieving Mutual Guidances between Multiple Intent
Detection and Slot Filling via Heterogeneous Semantics-Label Graphs [39.76268402567324]
We propose a novel model termed Co-guiding Net, which implements a two-stage framework achieving the textitmutual guidances between the two tasks.
Specifically, we propose two textitheterogeneous graph attention networks working on the proposed two textitheterogeneous semantics-label graphs.
Experiment results show that our model outperforms existing models by a large margin, obtaining a relative improvement of 19.3% over the previous best model on MixATIS dataset.
arXiv Detail & Related papers (2022-10-19T08:34:51Z) - On the Role of Bidirectionality in Language Model Pre-Training [85.14614350372004]
We study the role of bidirectionality in next token prediction, text infilling, zero-shot priming and fine-tuning.
We train models with up to 6.7B parameters, and find differences to remain consistent at scale.
arXiv Detail & Related papers (2022-05-24T02:25:05Z) - Bi-directional Joint Neural Networks for Intent Classification and Slot
Filling [5.3361357265365035]
We propose a bi-directional joint model for intent classification and slot filling.
Our model achieves state-of-the-art results on intent classification accuracy, slot filling F1, and significantly improves sentence-level semantic frame accuracy.
arXiv Detail & Related papers (2022-02-26T06:35:21Z) - Cascaded Human-Object Interaction Recognition [175.60439054047043]
We introduce a cascade architecture for a multi-stage, coarse-to-fine HOI understanding.
At each stage, an instance localization network progressively refines HOI proposals and feeds them into an interaction recognition network.
With our carefully-designed human-centric relation features, these two modules work collaboratively towards effective interaction understanding.
arXiv Detail & Related papers (2020-03-09T17:05:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.