KRLS: Improving End-to-End Response Generation in Task Oriented Dialog
with Reinforced Keywords Learning
- URL: http://arxiv.org/abs/2211.16773v5
- Date: Thu, 19 Oct 2023 19:16:12 GMT
- Title: KRLS: Improving End-to-End Response Generation in Task Oriented Dialog
with Reinforced Keywords Learning
- Authors: Xiao Yu, Qingyang Wu, Kun Qian, Zhou Yu
- Abstract summary: In task-oriented dialogs (TOD), reinforcement learning algorithms train a model to directly optimize response for task-related metrics.
We investigate an approach to create a more efficient RL-based algorithm to improve TOD performance in an offline setting.
Experiments on the MultiWoZ dataset show our new training algorithm, Keywords Reinforcement Learning with Next-word Sampling (KRLS), achieves state-of-the-art performance.
- Score: 25.421649004269373
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In task-oriented dialogs (TOD), reinforcement learning (RL) algorithms train
a model to directly optimize response for task-related metrics. However, RL
needs to perform exploration, which can be time-consuming due to the slow
auto-regressive sequence generation process. We investigate an approach to
create a more efficient RL-based algorithm to improve TOD performance in an
offline setting. First, we use a faster generation procedure that samples from
independent next-word distributions after training the language model (LM) with
supervised learning. We then introduce a fine-grained reward function to help
the model focus on learning key information in a dialog, by measuring the
importance and semantic closeness of each generated token. Experiments on the
MultiWoZ dataset show our new training algorithm, Keywords Reinforcement
Learning with Next-word Sampling (KRLS), achieves state-of-the-art performance
on the end-to-end response generation task, with a 15% training time reduction
compared to a standard RL algorithm using auto-regressive generation.
Related papers
- Reinforcement Learning with Action Sequence for Data-Efficient Robot Learning [62.3886343725955]
We introduce a novel RL algorithm that learns a critic network that outputs Q-values over a sequence of actions.
By explicitly training the value functions to learn the consequence of executing a series of current and future actions, our algorithm allows for learning useful value functions from noisy trajectories.
arXiv Detail & Related papers (2024-11-19T01:23:52Z) - Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning [62.984693936073974]
Value-based reinforcement learning can learn effective policies for a wide range of multi-turn problems.
Current value-based RL methods have proven particularly challenging to scale to the setting of large language models.
We propose a novel offline RL algorithm that addresses these drawbacks, casting Q-learning as a modified supervised fine-tuning problem.
arXiv Detail & Related papers (2024-11-07T21:36:52Z) - QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning [58.767866109043055]
We introduce Query-dependent Prompt Optimization (QPO), which iteratively fine-tune a small pretrained language model to generate optimal prompts tailored to the input queries.
We derive insights from offline prompting demonstration data, which already exists in large quantities as a by-product of benchmarking diverse prompts on open-sourced tasks.
Experiments on various LLM scales and diverse NLP and math tasks demonstrate the efficacy and cost-efficiency of our method in both zero-shot and few-shot scenarios.
arXiv Detail & Related papers (2024-08-20T03:06:48Z) - Reinforcement Learning with Token-level Feedback for Controllable Text Generation [16.117006822479407]
We propose a novel reinforcement learning algorithm named TOLE which formulates TOken-LEvel rewards for controllable text generation.
Experimental results show that our algorithm can achieve superior performance on both single-attribute and multi-attribute control tasks.
arXiv Detail & Related papers (2024-03-18T08:18:37Z) - How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback.
Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities.
We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z) - LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language
Models [56.25156596019168]
This paper introduces the LMRL-Gym benchmark for evaluating multi-turn RL for large language models (LLMs)
Our benchmark consists of 8 different language tasks, which require multiple rounds of language interaction and cover a range of tasks in open-ended dialogue and text games.
arXiv Detail & Related papers (2023-11-30T03:59:31Z) - ESRL: Efficient Sampling-based Reinforcement Learning for Sequence
Generation [43.506732624371786]
We introduce two-stage sampling and dynamic sampling approaches to improve the sampling efficiency during training sequence generation models via RL.
Experimental results show that the efficient sampling-based RL, referred to as ESRL, can outperform all baselines in terms of both training efficiency and memory consumption.
arXiv Detail & Related papers (2023-08-04T09:35:45Z) - Text Generation with Efficient (Soft) Q-Learning [91.47743595382758]
Reinforcement learning (RL) offers a more flexible solution by allowing users to plug in arbitrary task metrics as reward.
We introduce a new RL formulation for text generation from the soft Q-learning perspective.
We apply the approach to a wide range of tasks, including learning from noisy/negative examples, adversarial attacks, and prompt generation.
arXiv Detail & Related papers (2021-06-14T18:48:40Z) - Self-Paced Deep Reinforcement Learning [42.467323141301826]
Curriculum reinforcement learning (CRL) improves the learning speed and stability of an agent by exposing it to a tailored series of tasks throughout learning.
Despite empirical successes, an open question in CRL is how to automatically generate a curriculum for a given reinforcement learning (RL) agent, avoiding manual design.
We propose an answer by interpreting the curriculum generation as an inference problem, where distributions over tasks are progressively learned to approach the target task.
This approach leads to an automatic curriculum generation, whose pace is controlled by the agent, with solid theoretical motivation and easily integrated with deep RL algorithms.
arXiv Detail & Related papers (2020-04-24T15:48:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.