Stabilized In-Context Learning with Pre-trained Language Models for Few
Shot Dialogue State Tracking
- URL: http://arxiv.org/abs/2302.05932v1
- Date: Sun, 12 Feb 2023 15:05:10 GMT
- Title: Stabilized In-Context Learning with Pre-trained Language Models for Few
Shot Dialogue State Tracking
- Authors: Derek Chen, Kun Qian, Zhou Yu
- Abstract summary: Large pre-trained language models (PLMs) have shown impressive unaided performance across many NLP tasks.
For more complex tasks such as dialogue state tracking (DST), designing prompts that reliably convey the desired intent is nontrivial.
We introduce a saliency model to limit dialogue text length, allowing us to include more exemplars per query.
- Score: 57.92608483099916
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Prompt-based methods with large pre-trained language models (PLMs) have shown
impressive unaided performance across many NLP tasks. These models improve even
further with the addition of a few labeled in-context exemplars to guide output
generation. However, for more complex tasks such as dialogue state tracking
(DST), designing prompts that reliably convey the desired intent is nontrivial,
leading to unstable results. Furthermore, building in-context exemplars for
dialogue tasks is difficult because conversational contexts are long while
model input lengths are relatively short. To overcome these issues we first
adapt a meta-learning scheme to the dialogue domain which stabilizes the
ability of the model to perform well under various prompts. We additionally
design a novel training method to improve upon vanilla retrieval mechanisms to
find ideal in-context examples. Finally, we introduce a saliency model to limit
dialogue text length, allowing us to include more exemplars per query. In
effect, we are able to achieve highly competitive results for few-shot DST on
MultiWOZ.
Related papers
- RAVEN: In-Context Learning with Retrieval-Augmented Encoder-Decoder Language Models [57.12888828853409]
RAVEN is a model that combines retrieval-augmented masked language modeling and prefix language modeling.
Fusion-in-Context Learning enables the model to leverage more in-context examples without requiring additional training.
Our work underscores the potential of retrieval-augmented encoder-decoder language models for in-context learning.
arXiv Detail & Related papers (2023-08-15T17:59:18Z) - Diverse Retrieval-Augmented In-Context Learning for Dialogue State
Tracking [3.8073142980733]
We propose RefPyDST, which advances the state of the art with three advancements to in-context learning for dialogue state tracking.
First, we formulate DST as a Python programming task, explicitly modeling language coreference as variable reference in Python.
Second, since in-context learning depends highly on the context examples, we propose a method to retrieve a diverse set of relevant examples to improve performance.
arXiv Detail & Related papers (2023-07-04T03:15:52Z) - Pre-training Multi-party Dialogue Models with Latent Discourse Inference [85.9683181507206]
We pre-train a model that understands the discourse structure of multi-party dialogues, namely, to whom each utterance is replying.
To fully utilize the unlabeled data, we propose to treat the discourse structures as latent variables, then jointly infer them and pre-train the discourse-aware model.
arXiv Detail & Related papers (2023-05-24T14:06:27Z) - GODEL: Large-Scale Pre-Training for Goal-Directed Dialog [119.1397031992088]
We introduce GODEL, a large pre-trained language model for dialog.
We show that GODEL outperforms state-of-the-art pre-trained dialog models in few-shot fine-tuning setups.
A novel feature of our evaluation methodology is the introduction of a notion of utility that assesses the usefulness of responses.
arXiv Detail & Related papers (2022-06-22T18:19:32Z) - A Study on Prompt-based Few-Shot Learning Methods for Belief State
Tracking in Task-oriented Dialog Systems [10.024834304960846]
We tackle the Dialogue Belief State Tracking problem of task-oriented conversational systems.
Recent approaches to this problem leveraging Transformer-based models have yielded great results.
We explore prompt-based few-shot learning for Dialogue Belief State Tracking.
arXiv Detail & Related papers (2022-04-18T05:29:54Z) - In-Context Learning for Few-Shot Dialogue State Tracking [55.91832381893181]
We propose an in-context (IC) learning framework for few-shot dialogue state tracking (DST)
A large pre-trained language model (LM) takes a test instance and a few annotated examples as input, and directly decodes the dialogue states without any parameter updates.
This makes the LM more flexible and scalable compared to prior few-shot DST work when adapting to new domains and scenarios.
arXiv Detail & Related papers (2022-03-16T11:58:24Z) - Representation Learning for Conversational Data using Discourse Mutual
Information Maximization [9.017156603976915]
We argue that the structure-unaware word-by-word generation is not suitable for effective conversation modeling.
We propose a structure-aware Mutual Information based loss-function DMI for training dialog-representation models.
Our models show the most promising performance on the dialog evaluation task DailyDialog++, in both random and adversarial negative scenarios.
arXiv Detail & Related papers (2021-12-04T13:17:07Z) - Response Generation with Context-Aware Prompt Learning [19.340498579331555]
We present a novel approach for pre-trained dialogue modeling that casts the dialogue generation problem as a prompt-learning task.
Instead of fine-tuning on limited dialogue data, our approach, DialogPrompt, learns continuous prompt embeddings optimized for dialogue contexts.
Our approach significantly outperforms the fine-tuning baseline and the generic prompt-learning methods.
arXiv Detail & Related papers (2021-11-04T05:40:13Z) - RADDLE: An Evaluation Benchmark and Analysis Platform for Robust
Task-oriented Dialog Systems [75.87418236410296]
We introduce the RADDLE benchmark, a collection of corpora and tools for evaluating the performance of models across a diverse set of domains.
RADDLE is designed to favor and encourage models with a strong generalization ability.
We evaluate recent state-of-the-art systems based on pre-training and fine-tuning, and find that grounded pre-training on heterogeneous dialog corpora performs better than training a separate model per domain.
arXiv Detail & Related papers (2020-12-29T08:58:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.