Guiding attention in Sequence-to-sequence models for Dialogue Act
prediction
- URL: http://arxiv.org/abs/2002.08801v2
- Date: Wed, 26 Feb 2020 07:25:05 GMT
- Title: Guiding attention in Sequence-to-sequence models for Dialogue Act
prediction
- Authors: Pierre Colombo, Emile Chapuis, Matteo Manica, Emmanuel Vignon,
Giovanna Varni, Chloe Clavel
- Abstract summary: We leverage seq2seq approaches widely adopted in Neural Machine Translation to improve the modelling of tag sequentiality.
In this work, we introduce a seq2seq model tailored for DA classification using: a hierarchical encoder, a novel guided attention mechanism and beam search applied to both training and inference.
The proposed approach achieves an unmatched accuracy score of 85% on SwDA, and state-of-the-art accuracy score of 91.6% on MRDA.
- Score: 10.171893967006017
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The task of predicting dialog acts (DA) based on conversational dialog is a
key component in the development of conversational agents. Accurately
predicting DAs requires a precise modeling of both the conversation and the
global tag dependencies. We leverage seq2seq approaches widely adopted in
Neural Machine Translation (NMT) to improve the modelling of tag sequentiality.
Seq2seq models are known to learn complex global dependencies while currently
proposed approaches using linear conditional random fields (CRF) only model
local tag dependencies. In this work, we introduce a seq2seq model tailored for
DA classification using: a hierarchical encoder, a novel guided attention
mechanism and beam search applied to both training and inference. Compared to
the state of the art our model does not require handcrafted features and is
trained end-to-end. Furthermore, the proposed approach achieves an unmatched
accuracy score of 85% on SwDA, and state-of-the-art accuracy score of 91.6% on
MRDA.
Related papers
- Pretraining Without Attention [114.99187017618408]
This work explores pretraining without attention by using recent advances in sequence routing based on state-space models (SSMs)
BiGS is able to match BERT pretraining accuracy on GLUE and can be extended to long-form pretraining of 4096 tokens without approximation.
arXiv Detail & Related papers (2022-12-20T18:50:08Z) - Hierarchical Phrase-based Sequence-to-Sequence Learning [94.10257313923478]
We describe a neural transducer that maintains the flexibility of standard sequence-to-sequence (seq2seq) models while incorporating hierarchical phrases as a source of inductive bias during training and as explicit constraints during inference.
Our approach trains two models: a discriminative derivation based on a bracketing grammar whose tree hierarchically aligns source and target phrases, and a neural seq2seq model that learns to translate the aligned phrases one-by-one.
arXiv Detail & Related papers (2022-11-15T05:22:40Z) - Enhancing Pre-trained Models with Text Structure Knowledge for Question
Generation [2.526624977753083]
We model text structure as answer position and syntactic dependency, and propose answer localness modeling and syntactic mask attention to address these limitations.
Experiments on SQuAD dataset show that our proposed two modules improve performance over the strong pre-trained model ProphetNet.
arXiv Detail & Related papers (2022-09-09T08:33:47Z) - CUP: Curriculum Learning based Prompt Tuning for Implicit Event Argument
Extraction [22.746071199667146]
Implicit event argument extraction (EAE) aims to identify arguments that could scatter over the document.
We propose a Curriculum learning based Prompt tuning (CUP) approach, which resolves implicit EAE by four learning stages.
In addition, we integrate a prompt-based encoder-decoder model to elicit related knowledge from pre-trained language models.
arXiv Detail & Related papers (2022-05-01T16:03:54Z) - A Gating Model for Bias Calibration in Generalized Zero-shot Learning [18.32369721322249]
Generalized zero-shot learning (GZSL) aims at training a model that can generalize to unseen class data by only using auxiliary information.
One of the main challenges in GZSL is a biased model prediction toward seen classes caused by overfitting on only available seen class data during training.
We propose a two-stream autoencoder-based gating model for GZSL.
arXiv Detail & Related papers (2022-03-08T16:41:06Z) - DARER: Dual-task Temporal Relational Recurrent Reasoning Network for
Joint Dialog Sentiment Classification and Act Recognition [39.76268402567324]
Task of joint dialog sentiment classification (DSC) and act recognition (DAR) aims to simultaneously predict the sentiment label and act label for each utterance in a dialog.
We put forward a new framework which models the explicit dependencies via integrating textitprediction-level interactions.
To implement our framework, we propose a novel model dubbed DARER, which first generates the context-, speaker- and temporal-sensitive utterance representations.
arXiv Detail & Related papers (2022-03-08T05:19:18Z) - Complex Event Forecasting with Prediction Suffix Trees: Extended
Technical Report [70.7321040534471]
Complex Event Recognition (CER) systems have become popular in the past two decades due to their ability to "instantly" detect patterns on real-time streams of events.
There is a lack of methods for forecasting when a pattern might occur before such an occurrence is actually detected by a CER engine.
We present a formal framework that attempts to address the issue of Complex Event Forecasting.
arXiv Detail & Related papers (2021-09-01T09:52:31Z) - Hierarchical Modeling for Out-of-Scope Domain and Intent Classification [55.23920796595698]
This paper focuses on out-of-scope intent classification in dialog systems.
We propose a hierarchical multi-task learning approach based on a joint model to classify domain and intent simultaneously.
Experiments show that the model outperforms existing methods in terms of accuracy, out-of-scope recall and F1.
arXiv Detail & Related papers (2021-04-30T06:38:23Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Guider l'attention dans les modeles de sequence a sequence pour la
prediction des actes de dialogue [10.171893967006017]
We leverage seq2seq approaches widely adopted in Neural Machine Translation to improve the modelling of tag sequentiality.
In this work, we introduce a seq2seq model tailored for DA classification using: a hierarchical encoder, a novel guided attention mechanism and beam search applied to both training and inference.
The proposed approach achieves an unmatched accuracy score of 85% on SwDA, and state-of-the-art accuracy score of 91.6% on MRDA.
arXiv Detail & Related papers (2020-02-21T17:09:19Z) - Joint Contextual Modeling for ASR Correction and Language Understanding [60.230013453699975]
We propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with language understanding (LU)
We show that the error rates of off the shelf ASR and following LU systems can be reduced significantly by 14% relative with joint models trained using small amounts of in-domain data.
arXiv Detail & Related papers (2020-01-28T22:09:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.