Related papers: Efficient Two-Phase Offline Deep Reinforcement Learning from Preference Feedback

Efficient Two-Phase Offline Deep Reinforcement Learning from Preference Feedback

URL: http://arxiv.org/abs/2401.00330v1
Date: Sat, 30 Dec 2023 21:37:18 GMT
Title: Efficient Two-Phase Offline Deep Reinforcement Learning from Preference Feedback
Authors: Yinglun Xu, Gagandeep Singh
Abstract summary: We find a challenge in applying two-phase learning in the offline PBRL setting. We propose a two-phasing learning approach under behavior regularization through action clipping. Our method ignores such state-actions during the second learning phase to achieve higher learning efficiency.
Score: 5.683832910692926
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this work, we consider the offline preference-based reinforcement learning problem. We focus on the two-phase learning approach that is prevalent in previous reinforcement learning from human preference works. We find a challenge in applying two-phase learning in the offline PBRL setting that the learned utility model can be too hard for the learning agent to optimize during the second learning phase. To overcome the challenge, we propose a two-phasing learning approach under behavior regularization through action clipping. The insight is that the state-actions which are poorly covered by the dataset can only provide limited information and increase the complexity of the problem in the second learning phase. Our method ignores such state-actions during the second learning phase to achieve higher learning efficiency. We empirically verify that our method has high learning efficiency on a variety of datasets in robotic control environments.

Related papers

Online inductive learning from answer sets for efficient reinforcement learning exploration [52.03682298194168]
We exploit inductive learning of answer set programs to learn a set of logical rules representing an explainable approximation of the agent policy. We then perform answer set reasoning on the learned rules to guide the exploration of the learning agent at the next batch. Our methodology produces a significant boost in the discounted return achieved by the agent, even in the first batches of training.
arXiv Detail & Related papers (2025-01-13T16:13:22Z)
Towards Robust Knowledge Unlearning: An Adversarial Framework for Assessing and Improving Unlearning Robustness in Large Language Models [19.015202590038996]
We design Dynamic Unlearning Attack (DUA), a dynamic and automated framework to attack unlearned models. We propose Latent Adrial Unlearning (LAU), a universal framework that effectively enhances the robustness of the unlearned process. We demonstrate that LAU improves unlearning effectiveness by over $53.5%$, cause only less than a $11.6%$ reduction in neighboring knowledge, and have almost no impact on the model's general capabilities.
arXiv Detail & Related papers (2024-08-20T09:36:04Z)
Imitation Game: A Model-based and Imitation Learning Deep Reinforcement Learning Hybrid [39.58317527488534]
We present the work in progress towards a hybrid agent architecture that combines model-based Deep Reinforcement Learning with imitation learning to overcome both problems. In this paper, we present the work in progress towards a hybrid agent architecture that combines model-based Deep Reinforcement Learning with imitation learning to overcome both problems.
arXiv Detail & Related papers (2024-04-02T09:55:30Z)
RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning. Our proposed method uses reinforcement learning with user intervention signals themselves as rewards. This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z)
Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data. In this paper, we propose an adaptive scheme for action quantization. We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z)
Efficient Performance Bounds for Primal-Dual Reinforcement Learning from Demonstrations [1.0609815608017066]
We consider large-scale Markov decision processes with an unknown cost function and address the problem of learning a policy from a finite set of expert demonstrations. Existing inverse reinforcement learning methods come with strong theoretical guarantees, but are computationally expensive. We introduce a novel bilinear saddle-point framework using Lagrangian duality to bridge the gap between theory and practice.
arXiv Detail & Related papers (2021-12-28T05:47:24Z)
Rethinking Supervised Learning and Reinforcement Learning in Task-Oriented Dialogue Systems [58.724629408229205]
We demonstrate how traditional supervised learning and a simulator-free adversarial learning method can be used to achieve performance comparable to state-of-the-art RL-based methods. Our main goal is not to beat reinforcement learning with supervised learning, but to demonstrate the value of rethinking the role of reinforcement learning and supervised learning in optimizing task-oriented dialogue systems.
arXiv Detail & Related papers (2020-09-21T12:04:18Z)
Bridging the Imitation Gap by Adaptive Insubordination [88.35564081175642]
We show that when the teaching agent makes decisions with access to privileged information, this information is marginalized during imitation learning. We propose 'Adaptive Insubordination' (ADVISOR) to address this gap. ADVISOR dynamically weights imitation and reward-based reinforcement learning losses during training, enabling on-the-fly switching between imitation and exploration.
arXiv Detail & Related papers (2020-07-23T17:59:57Z)
SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms. SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z)
Online Constrained Model-based Reinforcement Learning [13.362455603441552]
Key requirement is the ability to handle continuous state and action spaces while remaining within a limited time and resource budget. We propose a model based approach that combines Gaussian Process regression and Receding Horizon Control. We test our approach on a cart pole swing-up environment and demonstrate the benefits of online learning on an autonomous racing task.
arXiv Detail & Related papers (2020-04-07T15:51:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.