Controlling Conditional Language Models with Distributional Policy
Gradients
- URL: http://arxiv.org/abs/2112.00791v1
- Date: Wed, 1 Dec 2021 19:24:05 GMT
- Title: Controlling Conditional Language Models with Distributional Policy
Gradients
- Authors: Tomasz Korbak and Hady Elsahar and German Kruszewski and Marc Dymetman
- Abstract summary: General-purpose pretrained generative models often fail to meet some of the downstream requirements.
This raises an important question on how to adapt pre-trained generative models to a new task without destroying its capabilities.
Recent work has suggested to solve this problem by representing task-specific requirements through energy-based models.
In this paper, we extend this approach to conditional tasks by proposing Conditional DPG (CDPG)
- Score: 2.9176992922046923
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine learning is shifting towards general-purpose pretrained generative
models, trained in a self-supervised manner on large amounts of data, which can
then be applied to solve a large number of tasks. However, due to their generic
training methodology, these models often fail to meet some of the downstream
requirements (e.g. hallucination in abstractive summarization or wrong format
in automatic code generation). This raises an important question on how to
adapt pre-trained generative models to a new task without destroying its
capabilities. Recent work has suggested to solve this problem by representing
task-specific requirements through energy-based models (EBMs) and approximating
these EBMs using distributional policy gradients (DPG). Unfortunately, this
approach is limited to unconditional distributions, represented by
unconditional EBMs. In this paper, we extend this approach to conditional tasks
by proposing Conditional DPG (CDPG). We evaluate CDPG on three different
control objectives across two tasks: summarization with T5 and code generation
with GPT-Neo. Our results show that fine-tuning using CDPG robustly moves these
pretrained models closer towards meeting control objectives and -- in contrast
with baseline approaches -- does not result in catastrophic forgetting.
Related papers
- TA&AT: Enhancing Task-Oriented Dialog with Turn-Level Auxiliary Tasks
and Action-Tree Based Scheduled Sampling [16.77137239284608]
Task-oriented dialog systems have witnessed substantial progress due to conversational pre-training techniques.
We propose turn-level multi-task objectives for the encoder.
For the decoder, we introduce an action tree-based scheduled sampling technique.
arXiv Detail & Related papers (2024-01-28T11:02:23Z) - Deep Graph Reprogramming [112.34663053130073]
"Deep graph reprogramming" is a model reusing task tailored for graph neural networks (GNNs)
We propose an innovative Data Reprogramming paradigm alongside a Model Reprogramming paradigm.
arXiv Detail & Related papers (2023-04-28T02:04:29Z) - A multilevel reinforcement learning framework for PDE based control [0.2538209532048867]
Reinforcement learning (RL) is a promising method to solve control problems.
Model-free RL algorithms are sample inefficient and require thousands if not millions of samples to learn optimal control policies.
We propose a multilevel RL framework in order to ease this cost by exploiting sublevel models that correspond to coarser scale discretization.
arXiv Detail & Related papers (2022-10-15T23:52:48Z) - Diffusion Policies as an Expressive Policy Class for Offline
Reinforcement Learning [70.20191211010847]
Offline reinforcement learning (RL) aims to learn an optimal policy using a previously collected static dataset.
We introduce Diffusion Q-learning (Diffusion-QL) that utilizes a conditional diffusion model to represent the policy.
We show that our method can achieve state-of-the-art performance on the majority of the D4RL benchmark tasks.
arXiv Detail & Related papers (2022-08-12T09:54:11Z) - Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional
MoEs [63.936622239286685]
We find that interference among different tasks and modalities is the main factor to this phenomenon.
We introduce the Conditional Mixture-of-Experts (Conditional MoEs) to generalist models.
Code and pre-trained generalist models shall be released.
arXiv Detail & Related papers (2022-06-09T17:59:59Z) - Training and Inference on Any-Order Autoregressive Models the Right Way [97.39464776373902]
A family of Any-Order Autoregressive Models (AO-ARMs) has shown breakthrough performance in arbitrary conditional tasks.
We identify significant improvements to be made to previous formulations of AO-ARMs.
Our method leads to improved performance with no compromises on tractability.
arXiv Detail & Related papers (2022-05-26T18:00:02Z) - Revisiting Gaussian mixture critics in off-policy reinforcement
learning: a sample-based approach [28.199348547856175]
This paper revisits a natural alternative that removes the requirement of prior knowledge about the minimum and values a policy can attain.
It achieves state-of-the-art performance on a variety of challenging tasks.
arXiv Detail & Related papers (2022-04-21T16:44:47Z) - DQ-BART: Efficient Sequence-to-Sequence Model via Joint Distillation and
Quantization [75.72231742114951]
Large-scale pre-trained sequence-to-sequence models like BART and T5 achieve state-of-the-art performance on many generative NLP tasks.
These models pose a great challenge in resource-constrained scenarios owing to their large memory requirements and high latency.
We propose to jointly distill and quantize the model, where knowledge is transferred from the full-precision teacher model to the quantized and distilled low-precision student model.
arXiv Detail & Related papers (2022-03-21T18:04:25Z) - Reinforcement Learning from Demonstrations by Novel Interactive Expert
and Application to Automatic Berthing Control Systems for Unmanned Surface
Vessel [12.453219390225428]
Two novel practical methods of Reinforcement Learning from Demonstration (RLfD) are developed and applied to automatic berthing control systems for Unmanned Surface Vessel.
A new expert data generation method, called Model Predictive Based Expert (MPBE), is developed to provide high quality supervision data for RLfD algorithms.
Another novel RLfD algorithm based on the MP-DDPG, called Self-Guided Actor-Critic (SGAC) is present, which can effectively leverage MPBE by continuously querying it to generate high quality expert data online.
arXiv Detail & Related papers (2022-02-23T06:45:59Z) - Evaluating model-based planning and planner amortization for continuous
control [79.49319308600228]
We take a hybrid approach, combining model predictive control (MPC) with a learned model and model-free policy learning.
We find that well-tuned model-free agents are strong baselines even for high DoF control problems.
We show that it is possible to distil a model-based planner into a policy that amortizes the planning without any loss of performance.
arXiv Detail & Related papers (2021-10-07T12:00:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.