Related papers: Controlling Conditional Language Models with Distributional Policy Gradients

Controlling Conditional Language Models with Distributional Policy Gradients

URL: http://arxiv.org/abs/2112.00791v1
Date: Wed, 1 Dec 2021 19:24:05 GMT
Title: Controlling Conditional Language Models with Distributional Policy Gradients
Authors: Tomasz Korbak and Hady Elsahar and German Kruszewski and Marc Dymetman
Abstract summary: General-purpose pretrained generative models often fail to meet some of the downstream requirements. This raises an important question on how to adapt pre-trained generative models to a new task without destroying its capabilities. Recent work has suggested to solve this problem by representing task-specific requirements through energy-based models. In this paper, we extend this approach to conditional tasks by proposing Conditional DPG (CDPG)
Score: 2.9176992922046923
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Machine learning is shifting towards general-purpose pretrained generative models, trained in a self-supervised manner on large amounts of data, which can then be applied to solve a large number of tasks. However, due to their generic training methodology, these models often fail to meet some of the downstream requirements (e.g. hallucination in abstractive summarization or wrong format in automatic code generation). This raises an important question on how to adapt pre-trained generative models to a new task without destroying its capabilities. Recent work has suggested to solve this problem by representing task-specific requirements through energy-based models (EBMs) and approximating these EBMs using distributional policy gradients (DPG). Unfortunately, this approach is limited to unconditional distributions, represented by unconditional EBMs. In this paper, we extend this approach to conditional tasks by proposing Conditional DPG (CDPG). We evaluate CDPG on three different control objectives across two tasks: summarization with T5 and code generation with GPT-Neo. Our results show that fine-tuning using CDPG robustly moves these pretrained models closer towards meeting control objectives and -- in contrast with baseline approaches -- does not result in catastrophic forgetting.

Related papers

OOD Detection with immature Models [8.477943884416023]
Likelihood-based deep generative models (DGMs) have gained significant attention for their ability to approximate the distributions of high-dimensional data. These models lack a performance guarantee in assigning higher likelihood values to in-distribution (ID) inputs, data the models are trained on, compared to out-of-distribution (OOD) inputs. In this work, we demonstrate that using immature models,stopped at early stages of training, can mostly achieve equivalent or even superior results on this downstream task.
arXiv Detail & Related papers (2025-02-02T15:14:17Z)
TA&AT: Enhancing Task-Oriented Dialog with Turn-Level Auxiliary Tasks and Action-Tree Based Scheduled Sampling [16.77137239284608]
Task-oriented dialog systems have witnessed substantial progress due to conversational pre-training techniques. We propose turn-level multi-task objectives for the encoder. For the decoder, we introduce an action tree-based scheduled sampling technique.
arXiv Detail & Related papers (2024-01-28T11:02:23Z)
Deep Graph Reprogramming [112.34663053130073]
"Deep graph reprogramming" is a model reusing task tailored for graph neural networks (GNNs) We propose an innovative Data Reprogramming paradigm alongside a Model Reprogramming paradigm.
arXiv Detail & Related papers (2023-04-28T02:04:29Z)
A multilevel reinforcement learning framework for PDE based control [0.2538209532048867]
Reinforcement learning (RL) is a promising method to solve control problems. Model-free RL algorithms are sample inefficient and require thousands if not millions of samples to learn optimal control policies. We propose a multilevel RL framework in order to ease this cost by exploiting sublevel models that correspond to coarser scale discretization.
arXiv Detail & Related papers (2022-10-15T23:52:48Z)
Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning [70.20191211010847]
Offline reinforcement learning (RL) aims to learn an optimal policy using a previously collected static dataset. We introduce Diffusion Q-learning (Diffusion-QL) that utilizes a conditional diffusion model to represent the policy. We show that our method can achieve state-of-the-art performance on the majority of the D4RL benchmark tasks.
arXiv Detail & Related papers (2022-08-12T09:54:11Z)
Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs [63.936622239286685]
We find that interference among different tasks and modalities is the main factor to this phenomenon. We introduce the Conditional Mixture-of-Experts (Conditional MoEs) to generalist models. Code and pre-trained generalist models shall be released.
arXiv Detail & Related papers (2022-06-09T17:59:59Z)
Training and Inference on Any-Order Autoregressive Models the Right Way [97.39464776373902]
A family of Any-Order Autoregressive Models (AO-ARMs) has shown breakthrough performance in arbitrary conditional tasks. We identify significant improvements to be made to previous formulations of AO-ARMs. Our method leads to improved performance with no compromises on tractability.
arXiv Detail & Related papers (2022-05-26T18:00:02Z)
Revisiting Gaussian mixture critics in off-policy reinforcement learning: a sample-based approach [28.199348547856175]
This paper revisits a natural alternative that removes the requirement of prior knowledge about the minimum and values a policy can attain. It achieves state-of-the-art performance on a variety of challenging tasks.
arXiv Detail & Related papers (2022-04-21T16:44:47Z)
DQ-BART: Efficient Sequence-to-Sequence Model via Joint Distillation and Quantization [75.72231742114951]
Large-scale pre-trained sequence-to-sequence models like BART and T5 achieve state-of-the-art performance on many generative NLP tasks. These models pose a great challenge in resource-constrained scenarios owing to their large memory requirements and high latency. We propose to jointly distill and quantize the model, where knowledge is transferred from the full-precision teacher model to the quantized and distilled low-precision student model.
arXiv Detail & Related papers (2022-03-21T18:04:25Z)
Reinforcement Learning from Demonstrations by Novel Interactive Expert and Application to Automatic Berthing Control Systems for Unmanned Surface Vessel [12.453219390225428]
Two novel practical methods of Reinforcement Learning from Demonstration (RLfD) are developed and applied to automatic berthing control systems for Unmanned Surface Vessel. A new expert data generation method, called Model Predictive Based Expert (MPBE), is developed to provide high quality supervision data for RLfD algorithms. Another novel RLfD algorithm based on the MP-DDPG, called Self-Guided Actor-Critic (SGAC) is present, which can effectively leverage MPBE by continuously querying it to generate high quality expert data online.
arXiv Detail & Related papers (2022-02-23T06:45:59Z)
Evaluating model-based planning and planner amortization for continuous control [79.49319308600228]
We take a hybrid approach, combining model predictive control (MPC) with a learned model and model-free policy learning. We find that well-tuned model-free agents are strong baselines even for high DoF control problems. We show that it is possible to distil a model-based planner into a policy that amortizes the planning without any loss of performance.
arXiv Detail & Related papers (2021-10-07T12:00:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.