Effective Sequence-to-Sequence Dialogue State Tracking
- URL: http://arxiv.org/abs/2108.13990v1
- Date: Tue, 31 Aug 2021 17:27:59 GMT
- Title: Effective Sequence-to-Sequence Dialogue State Tracking
- Authors: Jeffrey Zhao, Mahdis Mahdieh, Ye Zhang, Yuan Cao, Yonghui Wu
- Abstract summary: We show that the choice of pre-training objective makes a significant difference to the state tracking quality.
We also explore using Pegasus, a span prediction-based pre-training objective for text summarization, for the state tracking model.
We found that pre-training for the seemingly distant summarization task works surprisingly well for dialogue state tracking.
- Score: 22.606650177804966
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sequence-to-sequence models have been applied to a wide variety of NLP tasks,
but how to properly use them for dialogue state tracking has not been
systematically investigated. In this paper, we study this problem from the
perspectives of pre-training objectives as well as the formats of context
representations. We demonstrate that the choice of pre-training objective makes
a significant difference to the state tracking quality. In particular, we find
that masked span prediction is more effective than auto-regressive language
modeling. We also explore using Pegasus, a span prediction-based pre-training
objective for text summarization, for the state tracking model. We found that
pre-training for the seemingly distant summarization task works surprisingly
well for dialogue state tracking. In addition, we found that while recurrent
state context representation works also reasonably well, the model may have a
hard time recovering from earlier mistakes. We conducted experiments on the
MultiWOZ 2.1-2.4 data sets with consistent observations.
Related papers
- DeTra: A Unified Model for Object Detection and Trajectory Forecasting [68.85128937305697]
Our approach formulates the union of the two tasks as a trajectory refinement problem.
To tackle this unified task, we design a refinement transformer that infers the presence, pose, and multi-modal future behaviors of objects.
In our experiments, we observe that ourmodel outperforms the state-of-the-art on Argoverse 2 Sensor and Open dataset.
arXiv Detail & Related papers (2024-06-06T18:12:04Z) - Opening the Black Box: Analyzing Attention Weights and Hidden States in
Pre-trained Language Models for Non-language Tasks [0.8889304968879164]
We apply a pre-trained language model to constrained arithmetic problems with hierarchical structure, to analyze their attention weight scores and hidden states.
The investigation reveals promising results, with the model addressing hierarchical problems in a moderately structured manner, similar to human problem-solving strategies.
The attention analysis allows us to hypothesize that the model can generalize to longer sequences in ListOps dataset, a conclusion later confirmed through testing on sequences longer than those in the training set.
arXiv Detail & Related papers (2023-06-21T11:48:07Z) - Inverse Dynamics Pretraining Learns Good Representations for Multitask
Imitation [66.86987509942607]
We evaluate how such a paradigm should be done in imitation learning.
We consider a setting where the pretraining corpus consists of multitask demonstrations.
We argue that inverse dynamics modeling is well-suited to this setting.
arXiv Detail & Related papers (2023-05-26T14:40:46Z) - STOA-VLP: Spatial-Temporal Modeling of Object and Action for
Video-Language Pre-training [30.16501510589718]
We propose a pre-training framework that jointly models object and action information across spatial and temporal dimensions.
We design two auxiliary tasks to better incorporate both kinds of information into the pre-training process of the video-language model.
arXiv Detail & Related papers (2023-02-20T03:13:45Z) - Towards Out-of-Distribution Sequential Event Prediction: A Causal
Treatment [72.50906475214457]
The goal of sequential event prediction is to estimate the next event based on a sequence of historical events.
In practice, the next-event prediction models are trained with sequential data collected at one time.
We propose a framework with hierarchical branching structures for learning context-specific representations.
arXiv Detail & Related papers (2022-10-24T07:54:13Z) - Leveraging Pre-Trained Language Models to Streamline Natural Language
Interaction for Self-Tracking [25.28975864365579]
We propose a novel NLP task for self-tracking that extracts close- and open-ended information from a retrospective activity log.
The framework augments the prompt using synthetic samples to transform the task into 10-shot learning, to address a cold-start problem in bootstrapping a new tracking topic.
arXiv Detail & Related papers (2022-05-31T01:58:04Z) - A Generative Language Model for Few-shot Aspect-Based Sentiment Analysis [90.24921443175514]
We focus on aspect-based sentiment analysis, which involves extracting aspect term, category, and predicting their corresponding polarities.
We propose to reformulate the extraction and prediction tasks into the sequence generation task, using a generative language model with unidirectional attention.
Our approach outperforms the previous state-of-the-art (based on BERT) on average performance by a large margins in few-shot and full-shot settings.
arXiv Detail & Related papers (2022-04-11T18:31:53Z) - In-Context Learning for Few-Shot Dialogue State Tracking [55.91832381893181]
We propose an in-context (IC) learning framework for few-shot dialogue state tracking (DST)
A large pre-trained language model (LM) takes a test instance and a few annotated examples as input, and directly decodes the dialogue states without any parameter updates.
This makes the LM more flexible and scalable compared to prior few-shot DST work when adapting to new domains and scenarios.
arXiv Detail & Related papers (2022-03-16T11:58:24Z) - Evaluating Document Coherence Modelling [37.287725949616934]
We examine the performance of a broad range of pretrained LMs on a sentence intrusion detection task for English.
Our experiments show that pretrained LMs perform impressively in in-domain evaluation, but experience a substantial drop in the cross-domain setting.
arXiv Detail & Related papers (2021-03-18T10:05:06Z) - Infusing Finetuning with Semantic Dependencies [62.37697048781823]
We show that, unlike syntax, semantics is not brought to the surface by today's pretrained models.
We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning.
arXiv Detail & Related papers (2020-12-10T01:27:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.