A BiRGAT Model for Multi-intent Spoken Language Understanding with
Hierarchical Semantic Frames
- URL: http://arxiv.org/abs/2402.18258v1
- Date: Wed, 28 Feb 2024 11:39:26 GMT
- Title: A BiRGAT Model for Multi-intent Spoken Language Understanding with
Hierarchical Semantic Frames
- Authors: Hongshen Xu, Ruisheng Cao, Su Zhu, Sheng Jiang, Hanchong Zhang, Lu
Chen and Kai Yu
- Abstract summary: We first propose a Multi-Intent dataset which is collected from a realistic in-Vehicle dialogue System, called MIVS.
The target semantic frame is organized in a 3-layer hierarchical structure to tackle the alignment and assignment problems in multi-intent cases.
We devise a BiRGAT model to encode the hierarchy of items, the backbone of which is a dual relational graph attention network.
- Score: 30.200413352223347
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Previous work on spoken language understanding (SLU) mainly focuses on
single-intent settings, where each input utterance merely contains one user
intent. This configuration significantly limits the surface form of user
utterances and the capacity of output semantics. In this work, we first propose
a Multi-Intent dataset which is collected from a realistic in-Vehicle dialogue
System, called MIVS. The target semantic frame is organized in a 3-layer
hierarchical structure to tackle the alignment and assignment problems in
multi-intent cases. Accordingly, we devise a BiRGAT model to encode the
hierarchy of ontology items, the backbone of which is a dual relational graph
attention network. Coupled with the 3-way pointer-generator decoder, our method
outperforms traditional sequence labeling and classification-based schemes by a
large margin.
Related papers
- A Generative Model for Joint Multiple Intent Detection and Slot Filling [3.060720241524644]
In task-oriented dialogue systems, spoken language understanding (SLU) is a critical component, which consists of two sub-tasks, intent detection and slot filling.<n>Most existing methods focus on the single-intent SLU, where each utterance only has one intent.<n>In this paper, we propose a generative framework to simultaneously address multiple intent detection and slot filling.
arXiv Detail & Related papers (2026-02-09T06:52:34Z) - 3SGen: Unified Subject, Style, and Structure-Driven Image Generation with Adaptive Task-specific Memory [54.056509629389915]
3SGen is a task-aware unified framework that performs all three conditioning modes within a single model.<n>At its core, an Adaptive Task-specific Memory (ATM) module dynamically disentangles, stores, and retrieves condition-specific priors.<n>We propose 3SGen-Bench, a unified image-driven generation benchmark with standardized metrics for evaluating cross-task fidelity and controllability.
arXiv Detail & Related papers (2025-12-22T11:07:27Z) - Plug-and-Play Clarifier: A Zero-Shot Multimodal Framework for Egocentric Intent Disambiguation [60.63465682731118]
The performance of egocentric AI agents is fundamentally limited by multimodal intent ambiguity.<n>We introduce the Plug-and-Play Clarifier, a zero-shot and modular framework that decomposes the problem into discrete, solvable sub-tasks.<n>Our framework improves the intent clarification performance of small language models by approximately 30%, making them competitive with significantly larger counterparts.
arXiv Detail & Related papers (2025-11-12T04:28:14Z) - Hierarchical Neural Semantic Representation for 3D Semantic Correspondence [72.8101601086805]
We design the hierarchical neural semantic representation (HNSR), which consists of a global semantic feature to capture high-level structure and multi-resolution local geometric features.<n>Second, we design a progressive global-to-local matching strategy, which establishes coarse semantic correspondence using the global semantic feature.<n>Third, our framework is training-free and broadly compatible with various pre-trained 3D generative backbones, demonstrating strong generalization across diverse shape categories.
arXiv Detail & Related papers (2025-09-22T07:23:07Z) - Prototype-Aware Multimodal Alignment for Open-Vocabulary Visual Grounding [11.244257545057508]
Prototype-Aware Multimodal Learning (PAML) is an innovative framework that addresses imperfect alignment between visual and linguistic modalities, insufficient cross-modal feature fusion, and ineffective utilization of semantic prototype information.<n>Our framework shows competitive performance in standard scene while achieving state-of-the-art results in open-vocabulary scene.
arXiv Detail & Related papers (2025-09-08T02:27:10Z) - StyDeco: Unsupervised Style Transfer with Distilling Priors and Semantic Decoupling [5.12285618196312]
StyDeco is an unsupervised framework that learns text representations specifically tailored for the style transfer task.<n>Our framework outperforms several existing approaches in both stylistic fidelity and structural preservation.
arXiv Detail & Related papers (2025-08-02T06:17:23Z) - All in One: Visual-Description-Guided Unified Point Cloud Segmentation [26.46051445945897]
VDG-Uni3DSeg is a novel framework that integrates pre-trained vision-language models and large language models.<n>Our method incorporates rich multimodal cues, facilitating fine-grained class and instance separation.
arXiv Detail & Related papers (2025-07-07T17:22:00Z) - A Pointer Network-based Approach for Joint Extraction and Detection of Multi-Label Multi-Class Intents [12.62162175115002]
This study addresses three critical tasks: extracting multiple intent spans from queries, detecting multiple intents, and developing a multi-lingual intent dataset.
We introduce a novel multi-label multi-class intent detection dataset (MLMCID-dataset) curated from existing benchmark datasets.
We also propose a pointer network-based architecture (MLMCID) to extract intent spans and detect multiple intents with coarse and fine-grained labels in the form of sextuplets.
arXiv Detail & Related papers (2024-10-29T19:10:12Z) - Towards Spoken Language Understanding via Multi-level Multi-grained Contrastive Learning [50.1035273069458]
Spoken language understanding (SLU) is a core task in task-oriented dialogue systems.
We propose a multi-level MMCL framework to apply contrastive learning at three levels, including utterance level, slot level, and word level.
Our framework achieves new state-of-the-art results on two public multi-intent SLU datasets.
arXiv Detail & Related papers (2024-05-31T14:34:23Z) - Segment Any 3D Object with Language [58.471327490684295]
We introduce Segment any 3D Object with LanguagE (SOLE), a semantic geometric and-aware visual-language learning framework with strong generalizability.
Specifically, we propose a multimodal fusion network to incorporate multimodal semantics in both backbone and decoder.
Our SOLE outperforms previous methods by a large margin on ScanNetv2, ScanNet200, and Replica benchmarks.
arXiv Detail & Related papers (2024-04-02T17:59:10Z) - Towards Realistic Zero-Shot Classification via Self Structural Semantic
Alignment [53.2701026843921]
Large-scale pre-trained Vision Language Models (VLMs) have proven effective for zero-shot classification.
In this paper, we aim at a more challenging setting, Realistic Zero-Shot Classification, which assumes no annotation but instead a broad vocabulary.
We propose the Self Structural Semantic Alignment (S3A) framework, which extracts structural semantic information from unlabeled data while simultaneously self-learning.
arXiv Detail & Related papers (2023-08-24T17:56:46Z) - Object Segmentation by Mining Cross-Modal Semantics [68.88086621181628]
We propose a novel approach by mining the Cross-Modal Semantics to guide the fusion and decoding of multimodal features.
Specifically, we propose a novel network, termed XMSNet, consisting of (1) all-round attentive fusion (AF), (2) coarse-to-fine decoder (CFD), and (3) cross-layer self-supervision.
arXiv Detail & Related papers (2023-05-17T14:30:11Z) - Guiding the PLMs with Semantic Anchors as Intermediate Supervision:
Towards Interpretable Semantic Parsing [57.11806632758607]
We propose to incorporate the current pretrained language models with a hierarchical decoder network.
By taking the first-principle structures as the semantic anchors, we propose two novel intermediate supervision tasks.
We conduct intensive experiments on several semantic parsing benchmarks and demonstrate that our approach can consistently outperform the baselines.
arXiv Detail & Related papers (2022-10-04T07:27:29Z) - Dialogue Meaning Representation for Task-Oriented Dialogue Systems [51.91615150842267]
We propose Dialogue Meaning Representation (DMR), a flexible and easily extendable representation for task-oriented dialogue.
Our representation contains a set of nodes and edges with inheritance hierarchy to represent rich semantics for compositional semantics and task-specific concepts.
We propose two evaluation tasks to evaluate different machine learning based dialogue models, and further propose a novel coreference resolution model GNNCoref for the graph-based coreference resolution task.
arXiv Detail & Related papers (2022-04-23T04:17:55Z) - A Template-guided Hybrid Pointer Network for
Knowledge-basedTask-oriented Dialogue Systems [15.654119998970499]
We propose a template-guided hybrid pointer network for the knowledge-based task-oriented dialogue system.
We design a memory pointer network model with a gating mechanism to fully exploit the semantic correlation between the retrieved answers and the ground-truth response.
arXiv Detail & Related papers (2021-06-10T15:49:26Z) - Recurrent Neural Networks with Mixed Hierarchical Structures for Natural
Language Processing [13.960152426268767]
Hierarchical structures exist in both linguistics and Natural Language Processing (NLP) tasks.
How to design RNNs to learn hierarchical representations of natural languages remains a long-standing challenge.
In this paper, we define two different types of boundaries referred to as static and dynamic boundaries, respectively, and then use them to construct a multi-layer hierarchical structure for document classification tasks.
arXiv Detail & Related papers (2021-06-04T15:50:42Z) - Automatic Intent-Slot Induction for Dialogue Systems [5.6195418981579435]
We propose a new task of em automatic intent-slot induction and propose a novel domain-independent tool.
That is, we design a coarse-to-fine three-step procedure including role-labeling, Concept-mining, And Pattern-mining.
We show that our RCAP can generate satisfactory SLU schema and outperforms the state-of-the-art supervised learning method.
arXiv Detail & Related papers (2021-03-16T07:21:31Z) - AGIF: An Adaptive Graph-Interactive Framework for Joint Multiple Intent
Detection and Slot Filling [69.59096090788125]
In this paper, we propose an Adaptive Graph-Interactive Framework (AGIF) for joint multiple intent detection and slot filling.
We introduce an intent-slot graph interaction layer to model the strong correlation between the slot and intents.
Such an interaction layer is applied to each token adaptively, which has the advantage to automatically extract the relevant intents information.
arXiv Detail & Related papers (2020-04-21T15:07:34Z) - MA-DST: Multi-Attention Based Scalable Dialog State Tracking [13.358314140896937]
Dialog State Tracking dialog agents provide a natural language interface for users to complete their goal.
To enable accurate multi-domain DST, the model needs to encode dependencies between past utterances and slot semantics.
We introduce a novel architecture for this task to encode the conversation history and slot semantics.
arXiv Detail & Related papers (2020-02-07T05:34:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.