Towards Joint Intent Detection and Slot Filling via Higher-order
Attention
- URL: http://arxiv.org/abs/2109.08890v2
- Date: Wed, 22 Sep 2021 15:26:56 GMT
- Title: Towards Joint Intent Detection and Slot Filling via Higher-order
Attention
- Authors: Dongsheng Chen, Zhiqi Huang, Xian Wu, Shen Ge, Yuexian Zou
- Abstract summary: Intent detection (ID) and Slot filling (SF) are two major tasks in spoken language understanding (SLU)
We propose a Bilinear attention block, which exploits both the contextual and channel-wise bilinear attention distributions.
We show that our approach yields improvements compared with the state-of-the-art approach.
- Score: 47.78365472691051
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Intent detection (ID) and Slot filling (SF) are two major tasks in spoken
language understanding (SLU). Recently, attention mechanism has been shown to
be effective in jointly optimizing these two tasks in an interactive manner.
However, latest attention-based works concentrated only on the first-order
attention design, while ignoring the exploration of higher-order attention
mechanisms. In this paper, we propose a BiLinear attention block, which
leverages bilinear pooling to simultaneously exploit both the contextual and
channel-wise bilinear attention distributions to capture the second-order
interactions between the input intent or slot features. Higher and even
infinity order interactions are built by stacking numerous blocks and assigning
Exponential Linear Unit (ELU) to blocks. Before the decoding stage, we
introduce the Dynamic Feature Fusion Layer to implicitly fuse intent and slot
information in a more effective way. Technically, instead of simply
concatenating intent and slot features, we first compute two correlation
matrices to weight on two features. Furthermore, we present Higher-order
Attention Network for the SLU tasks. Experiments on two benchmark datasets show
that our approach yields improvements compared with the state-of-the-art
approach. We also provide discussion to demonstrate the effectiveness of the
proposed approach.
Related papers
- Interactive Multi-Head Self-Attention with Linear Complexity [60.112941134420204]
We show that the interactions between cross-heads of the attention matrix enhance the information flow of the attention operation.
We propose an effective method to decompose the attention operation into query- and key-less components.
arXiv Detail & Related papers (2024-02-27T13:47:23Z) - MISCA: A Joint Model for Multiple Intent Detection and Slot Filling with
Intent-Slot Co-Attention [9.414164374919029]
Recent advanced approaches, which are joint models based on graphs, might still face two potential issues.
We propose a joint model named MISCA.
Our MISCA introduces an intent-slot co-attention mechanism and an underlying layer of label attention mechanism.
arXiv Detail & Related papers (2023-12-10T03:38:41Z) - CAT: Learning to Collaborate Channel and Spatial Attention from
Multi-Information Fusion [23.72040577828098]
We propose a plug-and-play attention module, which we term "CAT"-activating the Collaboration between spatial and channel Attentions.
Specifically, we represent traits as trainable coefficients (i.e., colla-factors) to adaptively combine contributions of different attention modules.
Our CAT outperforms existing state-of-the-art attention mechanisms in object detection, instance segmentation, and image classification.
arXiv Detail & Related papers (2022-12-13T02:34:10Z) - HAN: Higher-order Attention Network for Spoken Language Understanding [31.326152465734747]
We propose to replace the conventional attention with our proposed Bilinear attention block.
We conduct wide analysis to explore the effectiveness brought from the higher-order attention.
arXiv Detail & Related papers (2021-08-26T17:13:08Z) - Context-Aware Interaction Network for Question Matching [51.76812857301819]
We propose a context-aware interaction network (COIN) to align two sequences and infer their semantic relationship.
Specifically, each interaction block includes (1) a context-aware cross-attention mechanism to effectively integrate contextual information, and (2) a gate fusion layer to flexibly interpolate aligned representations.
arXiv Detail & Related papers (2021-04-17T05:03:56Z) - Online Multiple Object Tracking with Cross-Task Synergy [120.70085565030628]
We propose a novel unified model with synergy between position prediction and embedding association.
The two tasks are linked by temporal-aware target attention and distractor attention, as well as identity-aware memory aggregation model.
arXiv Detail & Related papers (2021-04-01T10:19:40Z) - A Co-Interactive Transformer for Joint Slot Filling and Intent Detection [61.109486326954205]
Intent detection and slot filling are two main tasks for building a spoken language understanding (SLU) system.
Previous studies either model the two tasks separately or only consider the single information flow from intent to slot.
We propose a Co-Interactive Transformer to consider the cross-impact between the two tasks simultaneously.
arXiv Detail & Related papers (2020-10-08T10:16:52Z) - Attention improves concentration when learning node embeddings [1.2233362977312945]
Given nodes labelled with search query text, we want to predict links to related queries that share products.
Experiments with a range of deep neural architectures show that simple feedforward networks with an attention mechanism perform best for learning embeddings.
We propose an analytically tractable model of query generation, AttEST, that views both products and the query text as vectors embedded in a latent space.
arXiv Detail & Related papers (2020-06-11T21:21:12Z) - X-Linear Attention Networks for Image Captioning [124.48670699658649]
We introduce a unified attention block -- X-Linear attention block, that fully employs bilinear pooling to selectively capitalize on visual information or perform multi-modal reasoning.
X-LAN integrates X-Linear attention block into image encoder and sentence decoder of image captioning model to leverage higher order intra- and inter-modal interactions.
Experiments on COCO benchmark demonstrate that our X-LAN obtains to-date the best published CIDEr performance of 132.0% on COCO Karpathy test split.
arXiv Detail & Related papers (2020-03-31T10:35:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.