Decision-Focused Model-based Reinforcement Learning for Reward Transfer
- URL: http://arxiv.org/abs/2304.03365v2
- Date: Mon, 1 Jan 2024 16:45:55 GMT
- Title: Decision-Focused Model-based Reinforcement Learning for Reward Transfer
- Authors: Abhishek Sharma, Sonali Parbhoo, Omer Gottesman, Finale Doshi-Velez
- Abstract summary: Decision-focused (DF) model-based reinforcement learning has recently been introduced as a powerful algorithm that can focus on learning the MDP dynamics that are most relevant for obtaining high returns.
We demonstrate that when the reward function is defined by preferences over multiple objectives, the DF model may be sensitive to changes in the objective preferences.
We develop the robust decision-focused (RDF) algorithm, which leverages the non-identifiability of DF solutions to learn models that maximize expected returns.
- Score: 30.47819337707417
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Decision-focused (DF) model-based reinforcement learning has recently been
introduced as a powerful algorithm that can focus on learning the MDP dynamics
that are most relevant for obtaining high returns. While this approach
increases the agent's performance by directly optimizing the reward, it does so
by learning less accurate dynamics from a maximum likelihood perspective. We
demonstrate that when the reward function is defined by preferences over
multiple objectives, the DF model may be sensitive to changes in the objective
preferences.In this work, we develop the robust decision-focused (RDF)
algorithm, which leverages the non-identifiability of DF solutions to learn
models that maximize expected returns while simultaneously learning models that
transfer to changes in the preference over multiple objectives. We demonstrate
the effectiveness of RDF on two synthetic domains and two healthcare
simulators, showing that it significantly improves the robustness of DF model
learning to changes in the reward function without compromising training-time
return.
Related papers
- Insights from the Inverse: Reconstructing LLM Training Goals Through Inverse RL [7.988692259455583]
Large language models (LLMs) trained with Reinforcement Learning from Human Feedback have demonstrated remarkable capabilities, but their underlying reward functions and decision-making processes remain opaque.
This paper introduces a novel approach to interpreting LLMs by applying inverse reinforcement learning (IRL) to recover their implicit reward functions.
We conduct experiments on toxicity-aligned LLMs of varying sizes, extracting reward models that achieve up to 80.40% accuracy in predicting human preferences.
arXiv Detail & Related papers (2024-10-16T12:14:25Z) - On the Modeling Capabilities of Large Language Models for Sequential Decision Making [52.128546842746246]
Large pretrained models are showing increasingly better performance in reasoning and planning tasks.
We evaluate their ability to produce decision-making policies, either directly, by generating actions, or indirectly.
In environments with unfamiliar dynamics, we explore how fine-tuning LLMs with synthetic data can significantly improve their reward modeling capabilities.
arXiv Detail & Related papers (2024-10-08T03:12:57Z) - MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild [81.32127423981426]
Multimodal emotion recognition based on audio and video data is important for real-world applications.
Recent methods have focused on exploiting advances of self-supervised learning (SSL) for pre-training of strong multimodal encoders.
We propose a different perspective on the problem and investigate the advancement of multimodal DFER performance by adapting SSL-pre-trained disjoint unimodal encoders.
arXiv Detail & Related papers (2024-04-13T13:39:26Z) - Secrets of RLHF in Large Language Models Part II: Reward Modeling [134.97964938009588]
We introduce a series of novel methods to mitigate the influence of incorrect and ambiguous preferences in the dataset.
We also introduce contrastive learning to enhance the ability of reward models to distinguish between chosen and rejected responses.
arXiv Detail & Related papers (2024-01-11T17:56:59Z) - On the Robustness of Decision-Focused Learning [0.0]
Decision-Focused Learning (DFL) is an emerging learning paradigm that tackles the task of training a machine learning (ML) model to predict missing parameters of an incomplete optimization problem, where the missing parameters are predicted.
DFL trains an ML model in an end-to-end system, by integrating the prediction and optimization tasks, providing better alignment of the training and testing objectives.
arXiv Detail & Related papers (2023-11-28T04:34:04Z) - Learning to Augment via Implicit Differentiation for Domain
Generalization [107.9666735637355]
Domain generalization (DG) aims to overcome the problem by leveraging multiple source domains to learn a domain-generalizable model.
In this paper, we propose a novel augmentation-based DG approach, dubbed AugLearn.
AugLearn shows effectiveness on three standard DG benchmarks, PACS, Office-Home and Digits-DG.
arXiv Detail & Related papers (2022-10-25T18:51:51Z) - Discriminator Augmented Model-Based Reinforcement Learning [47.094522301093775]
It is common in practice for the learned model to be inaccurate, impairing planning and leading to poor performance.
This paper aims to improve planning with an importance sampling framework that accounts for discrepancy between the true and learned dynamics.
arXiv Detail & Related papers (2021-03-24T06:01:55Z) - Improved Adversarial Training via Learned Optimizer [101.38877975769198]
We propose a framework to improve the robustness of adversarial training models.
By co-training's parameters model's weights, the proposed framework consistently improves robustness and steps adaptively for update directions.
arXiv Detail & Related papers (2020-04-25T20:15:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.