Efficient Two-Phase Offline Deep Reinforcement Learning from Preference
Feedback
- URL: http://arxiv.org/abs/2401.00330v1
- Date: Sat, 30 Dec 2023 21:37:18 GMT
- Title: Efficient Two-Phase Offline Deep Reinforcement Learning from Preference
Feedback
- Authors: Yinglun Xu, Gagandeep Singh
- Abstract summary: We find a challenge in applying two-phase learning in the offline PBRL setting.
We propose a two-phasing learning approach under behavior regularization through action clipping.
Our method ignores such state-actions during the second learning phase to achieve higher learning efficiency.
- Score: 5.683832910692926
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we consider the offline preference-based reinforcement learning
problem. We focus on the two-phase learning approach that is prevalent in
previous reinforcement learning from human preference works. We find a
challenge in applying two-phase learning in the offline PBRL setting that the
learned utility model can be too hard for the learning agent to optimize during
the second learning phase. To overcome the challenge, we propose a two-phasing
learning approach under behavior regularization through action clipping. The
insight is that the state-actions which are poorly covered by the dataset can
only provide limited information and increase the complexity of the problem in
the second learning phase. Our method ignores such state-actions during the
second learning phase to achieve higher learning efficiency. We empirically
verify that our method has high learning efficiency on a variety of datasets in
robotic control environments.
Related papers
- Improving Knowledge Distillation in Transfer Learning with Layer-wise Learning Rates [6.783548275689542]
We propose a layer-wise learning scheme that adjusts learning parameters per layer as a function of the differences in the Jacobian/Attention/Hessian of the output activations.
We received improved learning performance and stability against a wide range of datasets.
arXiv Detail & Related papers (2024-07-05T21:35:17Z) - Efficient Offline Reinforcement Learning: The Critic is Critical [5.916429671763282]
Off-policy reinforcement learning provides a promising approach for improving performance beyond supervised approaches.
We propose a best-of-both approach by first learning the behavior policy and critic with supervised learning, before improving with off-policy reinforcement learning.
arXiv Detail & Related papers (2024-06-19T09:16:38Z) - A More Practical Approach to Machine Unlearning [0.0]
Machine unlearning is the ability to remove the influence of specific data points from a trained model.
The embedding layer in GPT-2 is crucial for effective unlearning.
Fuzzy matching techniques shift the model to a new optimum, while iterative unlearning provides a more complete modality.
arXiv Detail & Related papers (2024-06-13T17:59:06Z) - Adaptive Rentention & Correction for Continual Learning [114.5656325514408]
A common problem in continual learning is the classification layer's bias towards the most recent task.
We name our approach Adaptive Retention & Correction (ARC)
ARC achieves an average performance increase of 2.7% and 2.6% on the CIFAR-100 and Imagenet-R datasets.
arXiv Detail & Related papers (2024-05-23T08:43:09Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Implicit Offline Reinforcement Learning via Supervised Learning [83.8241505499762]
Offline Reinforcement Learning (RL) via Supervised Learning is a simple and effective way to learn robotic skills from a dataset collected by policies of different expertise levels.
We show how implicit models can leverage return information and match or outperform explicit algorithms to acquire robotic skills from fixed datasets.
arXiv Detail & Related papers (2022-10-21T21:59:42Z) - Skill-based Meta-Reinforcement Learning [65.31995608339962]
We devise a method that enables meta-learning on long-horizon, sparse-reward tasks.
Our core idea is to leverage prior experience extracted from offline datasets during meta-learning.
arXiv Detail & Related papers (2022-04-25T17:58:19Z) - Perceiving the World: Question-guided Reinforcement Learning for
Text-based Games [64.11746320061965]
This paper introduces world-perceiving modules, which automatically decompose tasks and prune actions by answering questions about the environment.
We then propose a two-phase training framework to decouple language learning from reinforcement learning, which further improves the sample efficiency.
arXiv Detail & Related papers (2022-03-20T04:23:57Z) - Offline Preference-Based Apprenticeship Learning [11.21888613165599]
We study how an offline dataset can be used to address two challenges that autonomous systems face when they endeavor to learn from, adapt to, and collaborate with humans.
First, we use the offline dataset to efficiently infer the human's reward function via pool-based active preference learning.
Second, given this learned reward function, we perform offline reinforcement learning to optimize a policy based on the inferred human intent.
arXiv Detail & Related papers (2021-07-20T04:15:52Z) - Bilevel Continual Learning [76.50127663309604]
We present a novel framework of continual learning named "Bilevel Continual Learning" (BCL)
Our experiments on continual learning benchmarks demonstrate the efficacy of the proposed BCL compared to many state-of-the-art methods.
arXiv Detail & Related papers (2020-07-30T16:00:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.