TA&AT: Enhancing Task-Oriented Dialog with Turn-Level Auxiliary Tasks
and Action-Tree Based Scheduled Sampling
- URL: http://arxiv.org/abs/2401.15626v1
- Date: Sun, 28 Jan 2024 11:02:23 GMT
- Title: TA&AT: Enhancing Task-Oriented Dialog with Turn-Level Auxiliary Tasks
and Action-Tree Based Scheduled Sampling
- Authors: Longxiang Liu, Xiuxing Li, Yang Feng
- Abstract summary: Task-oriented dialog systems have witnessed substantial progress due to conversational pre-training techniques.
We propose turn-level multi-task objectives for the encoder.
For the decoder, we introduce an action tree-based scheduled sampling technique.
- Score: 16.77137239284608
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Task-oriented dialog systems have witnessed substantial progress due to
conversational pre-training techniques. Yet, two significant challenges
persist. First, most systems primarily utilize the latest turn's state label
for the generator. This practice overlooks the comprehensive value of state
labels in boosting the model's understanding for future generations. Second, an
overreliance on generated policy often leads to error accumulation, resulting
in suboptimal responses when adhering to incorrect actions. To combat these
challenges, we propose turn-level multi-task objectives for the encoder. With
the guidance of essential information from labeled intermediate states, we
establish a more robust representation for both understanding and generation.
For the decoder, we introduce an action tree-based scheduled sampling
technique. Specifically, we model the hierarchical policy as trees and utilize
the similarity between trees to sample negative policy based on scheduled
sampling, hoping the model to generate invariant responses under perturbations.
This method simulates potential pitfalls by sampling similar negative policy,
bridging the gap between task-oriented dialog training and inference. Among
methods without continual pre-training, our approach achieved state-of-the-art
(SOTA) performance on the MultiWOZ dataset series and was also competitive with
pre-trained SOTA methods.
Related papers
- Hierarchical Orchestra of Policies [1.6574413179773757]
HOP dynamically forms a hierarchy of policies based on a similarity metric between the current observations and previously encountered observations in successful tasks.
HOP does not require task labelling, allowing for robust adaptation in environments where boundaries between tasks are ambiguous.
Our experiments, conducted across multiple tasks in a procedurally generated suite of environments, demonstrate that HOP significantly outperforms baseline methods in retaining knowledge across tasks.
arXiv Detail & Related papers (2024-11-05T11:13:09Z) - ACTRESS: Active Retraining for Semi-supervised Visual Grounding [52.08834188447851]
A previous study, RefTeacher, makes the first attempt to tackle this task by adopting the teacher-student framework to provide pseudo confidence supervision and attention-based supervision.
This approach is incompatible with current state-of-the-art visual grounding models, which follow the Transformer-based pipeline.
Our paper proposes the ACTive REtraining approach for Semi-Supervised Visual Grounding, abbreviated as ACTRESS.
arXiv Detail & Related papers (2024-07-03T16:33:31Z) - An Effective-Efficient Approach for Dense Multi-Label Action Detection [23.100602876056165]
It is necessary to simultaneously learn (i) temporal dependencies and (ii) co-occurrence action relationships.
Recent approaches model temporal information by extracting multi-scale features through hierarchical transformer-based networks.
We argue that combining this with multiple sub-sampling processes in hierarchical designs can lead to further loss of positional information.
arXiv Detail & Related papers (2024-06-10T11:33:34Z) - Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks.
We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level.
We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z) - Rethinking Object Saliency Ranking: A Novel Whole-flow Processing
Paradigm [22.038715439842044]
This paper proposes a new paradigm for saliency ranking, which aims to completely focus on ranking salient objects by their "importance order"
The proposed approach outperforms existing state-of-the-art methods on the widely-used SALICON set.
arXiv Detail & Related papers (2023-12-06T01:51:03Z) - Time-series Generation by Contrastive Imitation [87.51882102248395]
We study a generative framework that seeks to combine the strengths of both: Motivated by a moment-matching objective to mitigate compounding error, we optimize a local (but forward-looking) transition policy.
At inference, the learned policy serves as the generator for iterative sampling, and the learned energy serves as a trajectory-level measure for evaluating sample quality.
arXiv Detail & Related papers (2023-11-02T16:45:25Z) - Let Offline RL Flow: Training Conservative Agents in the Latent Space of
Normalizing Flows [58.762959061522736]
offline reinforcement learning aims to train a policy on a pre-recorded and fixed dataset without any additional environment interactions.
We build upon recent works on learning policies in latent action spaces and use a special form of Normalizing Flows for constructing a generative model.
We evaluate our method on various locomotion and navigation tasks, demonstrating that our approach outperforms recently proposed algorithms.
arXiv Detail & Related papers (2022-11-20T21:57:10Z) - Prompt Conditioned VAE: Enhancing Generative Replay for Lifelong
Learning in Task-Oriented Dialogue [80.05509768165135]
generative replay methods are widely employed to consolidate past knowledge with generated pseudo samples.
Most existing generative replay methods use only a single task-specific token to control their models.
We propose a novel method, prompt conditioned VAE for lifelong learning, to enhance generative replay by incorporating tasks' statistics.
arXiv Detail & Related papers (2022-10-14T13:12:14Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - ERNIE-GEN: An Enhanced Multi-Flow Pre-training and Fine-tuning Framework
for Natural Language Generation [44.21363470798758]
ERNIE-GEN is an enhanced multi-flow sequence to sequence pre-training and fine-tuning framework.
It bridges the discrepancy between training and inference with an infilling generation mechanism and a noise-aware generation method.
It trains the model to predict semantically-complete spans consecutively rather than predicting word by word.
arXiv Detail & Related papers (2020-01-26T02:54:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.