S-EPOA: Overcoming the Indistinguishability of Segments with Skill-Driven Preference-Based Reinforcement Learning
- URL: http://arxiv.org/abs/2408.12130v2
- Date: Thu, 23 Jan 2025 13:52:23 GMT
- Title: S-EPOA: Overcoming the Indistinguishability of Segments with Skill-Driven Preference-Based Reinforcement Learning
- Authors: Ni Mu, Yao Luan, Yiqin Yang, Qing-shan Jia,
- Abstract summary: Preference-based reinforcement learning (PbRL) uses human preferences as a direct reward signal.
Traditional PbRL methods are often constrained by the indistinguishability of segments, which impedes the learning process.
We introduce Skill-Enhanced Preference Optimization Algorithm (S-EPOA), which integrates skill mechanisms into the preference learning framework.
- Score: 7.8063180607224165
- License:
- Abstract: Preference-based reinforcement learning (PbRL) stands out by utilizing human preferences as a direct reward signal, eliminating the need for intricate reward engineering. However, despite its potential, traditional PbRL methods are often constrained by the indistinguishability of segments, which impedes the learning process. In this paper, we introduce Skill-Enhanced Preference Optimization Algorithm (S-EPOA), which addresses the segment indistinguishability issue by integrating skill mechanisms into the preference learning framework. Specifically, we first conduct the unsupervised pretraining to learn useful skills. Then, we propose a novel query selection mechanism to balance the information gain and distinguishability over the learned skill space. Experimental results on a range of tasks, including robotic manipulation and locomotion, demonstrate that S-EPOA significantly outperforms conventional PbRL methods in terms of both robustness and learning efficiency. The results highlight the effectiveness of skill-driven learning in overcoming the challenges posed by segment indistinguishability.
Related papers
- Skill-Enhanced Reinforcement Learning Acceleration from Demonstrations [23.15178050525514]
We propose a two-stage method dubbed as Skill-enhanced Reinforcement Learning Acceleration (SeRLA)
SeRLA introduces a skill-level adversarial Positive-Unlabeled (PU) learning model to extract useful skill prior knowledge.
It then deploys a skill-based soft actor-critic algorithm to leverage this acquired prior knowledge in the downstream online RL stage.
arXiv Detail & Related papers (2024-12-09T04:58:14Z) - Employing Layerwised Unsupervised Learning to Lessen Data and Loss Requirements in Forward-Forward Algorithms [1.0514231683620516]
Forward-Forward algorithm trains deep learning models solely through the forward pass.
We propose an Unsupervised Forward-Forward algorithm to replace back-propagation.
We lead to stable learning and enable versatile utilization across various datasets and tasks.
arXiv Detail & Related papers (2024-04-23T01:49:12Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Augmenting Unsupervised Reinforcement Learning with Self-Reference [63.68018737038331]
Humans possess the ability to draw on past experiences explicitly when learning new tasks.
We propose the Self-Reference (SR) approach, an add-on module explicitly designed to leverage historical information.
Our approach achieves state-of-the-art results in terms of Interquartile Mean (IQM) performance and Optimality Gap reduction on the Unsupervised Reinforcement Learning Benchmark.
arXiv Detail & Related papers (2023-11-16T09:07:34Z) - Hierarchical Deep Counterfactual Regret Minimization [53.86223883060367]
In this paper, we introduce the first hierarchical version of Deep CFR, an innovative method that boosts learning efficiency in tasks involving extensively large state spaces and deep game trees.
A notable advantage of HDCFR over previous works is its ability to facilitate learning with predefined (human) expertise and foster the acquisition of skills that can be transferred to similar tasks.
arXiv Detail & Related papers (2023-05-27T02:05:41Z) - Reinforcement Learning from Diverse Human Preferences [68.4294547285359]
This paper develops a method for crowd-sourcing preference labels and learning from diverse human preferences.
The proposed method is tested on a variety of tasks in DMcontrol and Meta-world.
It has shown consistent and significant improvements over existing preference-based RL algorithms when learning from diverse feedback.
arXiv Detail & Related papers (2023-01-27T15:18:54Z) - Reward Uncertainty for Exploration in Preference-based Reinforcement
Learning [88.34958680436552]
We present an exploration method specifically for preference-based reinforcement learning algorithms.
Our main idea is to design an intrinsic reward by measuring the novelty based on learned reward.
Our experiments show that exploration bonus from uncertainty in learned reward improves both feedback- and sample-efficiency of preference-based RL algorithms.
arXiv Detail & Related papers (2022-05-24T23:22:10Z) - Self-Imitation Advantage Learning [43.8107780378031]
Self-imitation learning is a Reinforcement Learning method that encourages actions whose returns were higher than expected.
We propose a novel generalization of self-imitation learning for off-policy RL, based on a modification of the Bellman optimality operator.
arXiv Detail & Related papers (2020-12-22T13:21:50Z) - Recall and Learn: Fine-tuning Deep Pretrained Language Models with Less
Forgetting [66.45372974713189]
We propose a recall and learn mechanism, which adopts the idea of multi-task learning and jointly learns pretraining tasks and downstream tasks.
Experiments show that our method achieves state-of-the-art performance on the GLUE benchmark.
We provide open-source RecAdam, which integrates the proposed mechanisms into Adam to facility the NLP community.
arXiv Detail & Related papers (2020-04-27T08:59:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.