Related papers: Encoding Syntactic Knowledge in Transformer Encoder for Intent Detection and Slot Filling

Encoding Syntactic Knowledge in Transformer Encoder for Intent Detection and Slot Filling

URL: http://arxiv.org/abs/2012.11689v1
Date: Mon, 21 Dec 2020 21:25:11 GMT
Title: Encoding Syntactic Knowledge in Transformer Encoder for Intent Detection and Slot Filling
Authors: Jixuan Wang, Kai Wei, Martin Radfar, Weiwei Zhang, Clement Chung
Abstract summary: We propose a novel Transformer encoder-based architecture with syntactical knowledge encoded for intent detection and slot filling. We encode syntactic knowledge into the Transformer encoder by jointly training it to predict syntactic parse ancestors and part-of-speech of each token via multi-task learning.
Score: 6.234581622120001
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose a novel Transformer encoder-based architecture with syntactical knowledge encoded for intent detection and slot filling. Specifically, we encode syntactic knowledge into the Transformer encoder by jointly training it to predict syntactic parse ancestors and part-of-speech of each token via multi-task learning. Our model is based on self-attention and feed-forward layers and does not require external syntactic information to be available at inference time. Experiments show that on two benchmark datasets, our models with only two Transformer encoder layers achieve state-of-the-art results. Compared to the previously best performed model without pre-training, our models achieve absolute F1 score and accuracy improvement of 1.59% and 0.85% for slot filling and intent detection on the SNIPS dataset, respectively. Our models also achieve absolute F1 score and accuracy improvement of 0.1% and 0.34% for slot filling and intent detection on the ATIS dataset, respectively, over the previously best performed model. Furthermore, the visualization of the self-attention weights illustrates the benefits of incorporating syntactic information during training.

Related papers

Scaling Laws of Synthetic Data for Language Models [132.67350443447611]
We introduce SynthLLM, a scalable framework that transforms pre-training corpora into diverse, high-quality synthetic datasets. Our approach achieves this by automatically extracting and recombining high-level concepts across multiple documents using a graph algorithm.
arXiv Detail & Related papers (2025-03-25T11:07:12Z)
Improved Out-of-Scope Intent Classification with Dual Encoding and Threshold-based Re-Classification [6.975902383951604]
Current methodologies face difficulties with the unpredictable distribution of outliers. We present the Dual for Threshold-Based Re-Classification (DETER) to address these challenges. Our model outperforms previous benchmarks, increasing up to 13% and 5% in F1 score for known and unknown intents.
arXiv Detail & Related papers (2024-05-30T11:46:42Z)
A Principled Hierarchical Deep Learning Approach to Joint Image Compression and Classification [27.934109301041595]
This work proposes a three-step joint learning strategy to guide encoders to extract features that are compact, discriminative, and amenable to common augmentations/transformations. Tests show that our proposed method achieves accuracy improvement of up to 1.5% on CIFAR-10 and 3% on CIFAR-100 over conventional E2E cross-entropy training.
arXiv Detail & Related papers (2023-10-30T15:52:18Z)
Leveraging Pretrained ASR Encoders for Effective and Efficient End-to-End Speech Intent Classification and Slot Filling [13.515248068374625]
We propose to use an encoder pretrained on speech recognition (ASR) to initialize an end-to-end (E2E) Conformer-Transformer model. Our model achieves the new state-of-the-art results on the SLURP dataset, with 90.14% intent accuracy and 82.27% SLURP-F1.
arXiv Detail & Related papers (2023-07-13T20:50:19Z)
Self-Distilled Masked Auto-Encoders are Efficient Video Anomaly Detectors [117.61449210940955]
We propose an efficient abnormal event detection model based on a lightweight masked auto-encoder (AE) applied at the video frame level. We introduce an approach to weight tokens based on motion gradients, thus shifting the focus from the static background scene to the foreground objects. We generate synthetic abnormal events to augment the training videos, and task the masked AE model to jointly reconstruct the original frames.
arXiv Detail & Related papers (2023-06-21T06:18:05Z)
Convolutional Neural Networks for the classification of glitches in gravitational-wave data streams [52.77024349608834]
We classify transient noise signals (i.e.glitches) and gravitational waves in data from the Advanced LIGO detectors. We use models with a supervised learning approach, both trained from scratch using the Gravity Spy dataset. We also explore a self-supervised approach, pre-training models with automatically generated pseudo-labels.
arXiv Detail & Related papers (2023-03-24T11:12:37Z)
Decoder Tuning: Efficient Language Understanding as Decoding [84.68266271483022]
We present Decoder Tuning (DecT), which in contrast optimize task-specific decoder networks on the output side. By gradient-based optimization, DecT can be trained within several seconds and requires only one P query per sample. We conduct extensive natural language understanding experiments and show that DecT significantly outperforms state-of-the-art algorithms with a $200times$ speed-up.
arXiv Detail & Related papers (2022-12-16T11:15:39Z)
Confidence-Guided Data Augmentation for Deep Semi-Supervised Training [0.9968241071319184]
We propose a new data augmentation technique for semi-supervised learning settings that emphasizes learning from the most challenging regions of the feature space. We perform experiments on two benchmark RGB datasets: CIFAR-100 and STL-10, and show that the proposed scheme improves classification performance in terms of accuracy and robustness.
arXiv Detail & Related papers (2022-09-16T21:23:19Z)
Revisiting Classifier: Transferring Vision-Language Models for Video Recognition [102.93524173258487]
Transferring knowledge from task-agnostic pre-trained deep models for downstream tasks is an important topic in computer vision research. In this study, we focus on transferring knowledge for video classification tasks. We utilize the well-pretrained language model to generate good semantic target for efficient transferring learning.
arXiv Detail & Related papers (2022-07-04T10:00:47Z)
E2S2: Encoding-Enhanced Sequence-to-Sequence Pretraining for Language Understanding and Generation [95.49128988683191]
Sequence-to-sequence (seq2seq) learning is a popular fashion for large-scale pretraining language models. We propose an encoding-enhanced seq2seq pretraining strategy, namely E2S2. E2S2 improves the seq2seq models via integrating more efficient self-supervised information into the encoders.
arXiv Detail & Related papers (2022-05-30T08:25:36Z)
Relaxed Attention: A Simple Method to Boost Performance of End-to-End Automatic Speech Recognition [27.530537066239116]
We introduce the concept of relaxed attention, which is a gradual injection of a uniform distribution to the encoder-decoder attention weights during training. We find that transformers trained with relaxed attention outperform the standard baseline models consistently during decoding with external language models. On WSJ, we set a new benchmark for transformer-based end-to-end speech recognition with a word error rate of 3.65%, outperforming state of the art (4.20%) by 13.1% relative.
arXiv Detail & Related papers (2021-07-02T21:01:17Z)
Autoencoding Variational Autoencoder [56.05008520271406]
We study the implications of this behaviour on the learned representations and also the consequences of fixing it by introducing a notion of self consistency. We show that encoders trained with our self-consistency approach lead to representations that are robust (insensitive) to perturbations in the input introduced by adversarial attacks.
arXiv Detail & Related papers (2020-12-07T14:16:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.