Shattering the Agent-Environment Interface for Fine-Tuning Inclusive
Language Models
- URL: http://arxiv.org/abs/2305.11455v1
- Date: Fri, 19 May 2023 06:21:15 GMT
- Title: Shattering the Agent-Environment Interface for Fine-Tuning Inclusive
Language Models
- Authors: Wanqiao Xu, Shi Dong, Dilip Arumugam, Benjamin Van Roy
- Abstract summary: In this work, we adopt a novel perspective wherein a pre-trained language model is itself simultaneously a policy, reward function, and transition function.
An immediate consequence of this is that reward learning and language model fine-tuning can be performed jointly and directly, without requiring any further downstream policy optimization.
- Score: 24.107358120517336
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A centerpiece of the ever-popular reinforcement learning from human feedback
(RLHF) approach to fine-tuning autoregressive language models is the explicit
training of a reward model to emulate human feedback, distinct from the
language model itself. This reward model is then coupled with policy-gradient
methods to dramatically improve the alignment between language model outputs
and desired responses. In this work, we adopt a novel perspective wherein a
pre-trained language model is itself simultaneously a policy, reward function,
and transition function. An immediate consequence of this is that reward
learning and language model fine-tuning can be performed jointly and directly,
without requiring any further downstream policy optimization. While this
perspective does indeed break the traditional agent-environment interface, we
nevertheless maintain that there can be enormous statistical benefits afforded
by bringing to bear traditional algorithmic concepts from reinforcement
learning. Our experiments demonstrate one concrete instance of this through
efficient exploration based on the representation and resolution of epistemic
uncertainty. In order to illustrate these ideas in a transparent manner, we
restrict attention to a simple didactic data generating process and leave for
future work extension to systems of practical scale.
Related papers
- Collapsed Language Models Promote Fairness [88.48232731113306]
We find that debiased language models exhibit collapsed alignment between token representations and word embeddings.
We design a principled fine-tuning method that can effectively improve fairness in a wide range of debiasing methods.
arXiv Detail & Related papers (2024-10-06T13:09:48Z) - Counterfactuals As a Means for Evaluating Faithfulness of Attribution Methods in Autoregressive Language Models [6.394084132117747]
We propose a technique that leverages counterfactual generation to evaluate the faithfulness of attribution methods for autoregressive language models.
Our technique generates fluent, in-distribution counterfactuals, making the evaluation protocol more reliable.
arXiv Detail & Related papers (2024-08-21T00:17:59Z) - Robust Latent Representation Tuning for Image-text Classification [9.789498730131607]
We propose a robust latent representation tuning method for large models.
Our approach introduces a modality latent translation module to maximize the correlation between modalities, resulting in a robust representation.
Within this framework, common semantics are refined during training, and robust performance is achieved even in the absence of one modality.
arXiv Detail & Related papers (2024-06-10T06:29:00Z) - Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF [82.7679132059169]
Reinforcement learning from human feedback has emerged as a central tool for language model alignment.
We propose a new algorithm for online exploration in RLHF, Exploratory Preference Optimization (XPO)
XPO enjoys the strongest known provable guarantees and promising empirical performance.
arXiv Detail & Related papers (2024-05-31T17:39:06Z) - ReCoRe: Regularized Contrastive Representation Learning of World Model [21.29132219042405]
We present a world model that learns invariant features using contrastive unsupervised learning and an intervention-invariant regularizer.
Our method outperforms current state-of-the-art model-based and model-free RL methods and significantly improves on out-of-distribution point navigation tasks evaluated on the iGibson benchmark.
arXiv Detail & Related papers (2023-12-14T15:53:07Z) - Expedited Training of Visual Conditioned Language Generation via
Redundancy Reduction [61.16125290912494]
$textEVL_textGen$ is a framework designed for the pre-training of visually conditioned language generation models.
We show that our approach accelerates the training of vision-language models by a factor of 5 without a noticeable impact on overall performance.
arXiv Detail & Related papers (2023-10-05T03:40:06Z) - Fine-Tune Language Models as Multi-Modal Differential Equation Solvers [14.181842691371935]
We present a transformation of in-context operator learning into a multi-modal paradigm.
In particular, we take inspiration from the recent success of large language models, and propose using "captions" to integrate human knowledge about the operator.
arXiv Detail & Related papers (2023-08-09T16:44:25Z) - BatGPT: A Bidirectional Autoregessive Talker from Generative Pre-trained
Transformer [77.28871523946418]
BatGPT is a large-scale language model designed and trained jointly by Wuhan University and Shanghai Jiao Tong University.
It is capable of generating highly natural and fluent text in response to various types of input, including text prompts, images, and audio.
arXiv Detail & Related papers (2023-07-01T15:10:01Z) - Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural
Language Generation [68.9440575276396]
This survey aims to provide an overview of the recent research that has leveraged human feedback to improve natural language generation.
First, we introduce an encompassing formalization of feedback, and identify and organize existing research into a taxonomy following this formalization.
Second, we discuss how feedback can be described by its format and objective, and cover the two approaches proposed to use feedback (either for training or decoding): directly using the feedback or training feedback models.
Third, we provide an overview of the nascent field of AI feedback, which exploits large language models to make judgments based on a set of principles and minimize the need for
arXiv Detail & Related papers (2023-05-01T17:36:06Z) - Chain of Hindsight Aligns Language Models with Feedback [62.68665658130472]
We propose a novel technique, Chain of Hindsight, that is easy to optimize and can learn from any form of feedback, regardless of its polarity.
We convert all types of feedback into sequences of sentences, which are then used to fine-tune the model.
By doing so, the model is trained to generate outputs based on feedback, while learning to identify and correct negative attributes or errors.
arXiv Detail & Related papers (2023-02-06T10:28:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.