Hierarchical Transformer for Task Oriented Dialog Systems
- URL: http://arxiv.org/abs/2011.08067v3
- Date: Sun, 9 May 2021 10:25:13 GMT
- Title: Hierarchical Transformer for Task Oriented Dialog Systems
- Authors: Bishal Santra, Potnuru Anusha, Pawan Goyal
- Abstract summary: We show how a standard transformer can be morphed into any hierarchical encoder, including HRED and HIBERT like models, by using specially designed attention masks and positional encodings.
We demonstrate that Hierarchical Hierarchical helps achieve better natural language understanding of the contexts in transformer-based models for task-oriented dialog systems.
- Score: 11.743662338418867
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative models for dialog systems have gained much interest because of the
recent success of RNN and Transformer based models in tasks like question
answering and summarization. Although the task of dialog response generation is
generally seen as a sequence-to-sequence (Seq2Seq) problem, researchers in the
past have found it challenging to train dialog systems using the standard
Seq2Seq models. Therefore, to help the model learn meaningful utterance and
conversation level features, Sordoni et al. (2015b); Serban et al. (2016)
proposed Hierarchical RNN architecture, which was later adopted by several
other RNN based dialog systems. With the transformer-based models dominating
the seq2seq problems lately, the natural question to ask is the applicability
of the notion of hierarchy in transformer based dialog systems. In this paper,
we propose a generalized framework for Hierarchical Transformer Encoders and
show how a standard transformer can be morphed into any hierarchical encoder,
including HRED and HIBERT like models, by using specially designed attention
masks and positional encodings. We demonstrate that Hierarchical Encoding helps
achieve better natural language understanding of the contexts in
transformer-based models for task-oriented dialog systems through a wide range
of experiments.
Related papers
- Systematic Generalization and Emergent Structures in Transformers
Trained on Structured Tasks [6.525090891505941]
We show how a causal transformer can perform a set of algorithmic tasks, including copying, sorting, and hierarchical compositions.
We show that two-layer transformers learn generalizable solutions to multi-level problems and develop signs of systematic task decomposition.
These results provide key insights into how transformer models may be capable of decomposing complex decisions into reusable, multi-level policies.
arXiv Detail & Related papers (2022-10-02T00:46:36Z) - ORCHARD: A Benchmark For Measuring Systematic Generalization of
Multi-Hierarchical Reasoning [8.004425059996963]
We show that Transformer and LSTM models surprisingly fail in systematic generalization.
We also show that with increased references between hierarchies, Transformer performs no better than random.
arXiv Detail & Related papers (2021-11-28T03:11:37Z) - Thinking Like Transformers [64.96770952820691]
We propose a computational model for the transformer-encoder in the form of a programming language.
We show how RASP can be used to program solutions to tasks that could conceivably be learned by a Transformer.
We provide RASP programs for histograms, sorting, and Dyck-languages.
arXiv Detail & Related papers (2021-06-13T13:04:46Z) - Dual-decoder Transformer for Joint Automatic Speech Recognition and
Multilingual Speech Translation [71.54816893482457]
We introduce dual-decoder Transformer, a new model architecture that jointly performs automatic speech recognition (ASR) and multilingual speech translation (ST)
Our models are based on the original Transformer architecture but consist of two decoders, each responsible for one task (ASR or ST)
arXiv Detail & Related papers (2020-11-02T04:59:50Z) - End to End Dialogue Transformer [0.0019832631155284838]
We are inspired by the performance of the recurrent neural network-based model Sequicity.
We propose a dialogue system based on the Transformer architecture instead of Sequicity's RNN-based architecture.
arXiv Detail & Related papers (2020-08-24T12:43:08Z) - Dynamic Graph Representation Learning for Video Dialog via Multi-Modal
Shuffled Transformers [89.00926092864368]
We present a semantics-controlled multi-modal shuffled Transformer reasoning framework for the audio-visual scene aware dialog task.
We also present a novel dynamic scene graph representation learning pipeline that consists of an intra-frame reasoning layer producing-semantic graph representations for every frame.
Our results demonstrate state-of-the-art performances on all evaluation metrics.
arXiv Detail & Related papers (2020-07-08T02:00:22Z) - Conversational Question Reformulation via Sequence-to-Sequence
Architectures and Pretrained Language Models [56.268862325167575]
This paper presents an empirical study of conversational question reformulation (CQR) with sequence-to-sequence architectures and pretrained language models (PLMs)
We leverage PLMs to address the strong token-to-token independence assumption made in the common objective, maximum likelihood estimation, for the CQR task.
We evaluate fine-tuned PLMs on the recently-introduced CANARD dataset as an in-domain task and validate the models using data from the TREC 2019 CAsT Track as an out-domain task.
arXiv Detail & Related papers (2020-04-04T11:07:54Z) - DSTC8-AVSD: Multimodal Semantic Transformer Network with Retrieval Style
Word Generator [61.70748716353692]
Audio Visual Scene-aware Dialog (AVSD) is the task of generating a response for a question with a given scene, video, audio, and the history of previous turns in the dialog.
Existing systems for this task employ the transformers or recurrent neural network-based architecture with the encoder-decoder framework.
We propose a Multimodal Semantic Transformer Network. It employs a transformer-based architecture with an attention-based word embedding layer that generates words by querying word embeddings.
arXiv Detail & Related papers (2020-04-01T07:10:08Z) - Variational Transformers for Diverse Response Generation [71.53159402053392]
Variational Transformer (VT) is a variational self-attentive feed-forward sequence model.
VT combines the parallelizability and global receptive field computation of the Transformer with the variational nature of the CVAE.
We explore two types of VT: 1) modeling the discourse-level diversity with a global latent variable; and 2) augmenting the Transformer decoder with a sequence of finegrained latent variables.
arXiv Detail & Related papers (2020-03-28T07:48:02Z) - EmpTransfo: A Multi-head Transformer Architecture for Creating
Empathetic Dialog Systems [4.41738804598711]
This paper presents EmpTransfo, a multi-head Transformer architecture for creating an empathetic dialog system.
We show that utilizing the history of emotions and other metadata can improve the quality of generated conversations.
arXiv Detail & Related papers (2020-03-05T23:09:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.