Advances in Preference-based Reinforcement Learning: A Review
- URL: http://arxiv.org/abs/2408.11943v1
- Date: Wed, 21 Aug 2024 18:57:12 GMT
- Title: Advances in Preference-based Reinforcement Learning: A Review
- Authors: Youssef Abdelkareem, Shady Shehata, Fakhri Karray,
- Abstract summary: Preference-based reinforcement learning (PbRL) addresses that by utilizing human preferences as feedback from the experts instead of numeric rewards.
We present a unified PbRL framework to include the newly emerging approaches that improve the scalability and efficiency of PbRL.
- Score: 1.474723404975345
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reinforcement Learning (RL) algorithms suffer from the dependency on accurately engineered reward functions to properly guide the learning agents to do the required tasks. Preference-based reinforcement learning (PbRL) addresses that by utilizing human preferences as feedback from the experts instead of numeric rewards. Due to its promising advantage over traditional RL, PbRL has gained more focus in recent years with many significant advances. In this survey, we present a unified PbRL framework to include the newly emerging approaches that improve the scalability and efficiency of PbRL. In addition, we give a detailed overview of the theoretical guarantees and benchmarking work done in the field, while presenting its recent applications in complex real-world tasks. Lastly, we go over the limitations of the current approaches and the proposed future research directions.
Related papers
- Preference-Guided Reinforcement Learning for Efficient Exploration [7.83845308102632]
We introduce LOPE: Learning Online with trajectory Preference guidancE, an end-to-end preference-guided RL framework.
Our intuition is that LOPE directly adjusts the focus of online exploration by considering human feedback as guidance.
LOPE outperforms several state-of-the-art methods regarding convergence rate and overall performance.
arXiv Detail & Related papers (2024-07-09T02:11:12Z) - How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback.
Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities.
We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z) - Exploiting Estimation Bias in Clipped Double Q-Learning for Continous Control Reinforcement Learning Tasks [5.968716050740402]
This paper focuses on addressing and exploiting estimation biases in Actor-Critic methods for continuous control tasks.
We design a Bias Exploiting (BE) mechanism to dynamically select the most advantageous estimation bias during training of the RL agent.
Most State-of-the-art Deep RL algorithms can be equipped with the BE mechanism, without hindering performance or computational complexity.
arXiv Detail & Related papers (2024-02-14T10:44:03Z) - Reinforcement Learning-assisted Evolutionary Algorithm: A Survey and
Research Opportunities [63.258517066104446]
Reinforcement learning integrated as a component in the evolutionary algorithm has demonstrated superior performance in recent years.
We discuss the RL-EA integration method, the RL-assisted strategy adopted by RL-EA, and its applications according to the existing literature.
In the applications of RL-EA section, we also demonstrate the excellent performance of RL-EA on several benchmarks and a range of public datasets.
arXiv Detail & Related papers (2023-08-25T15:06:05Z) - Provable Reward-Agnostic Preference-Based Reinforcement Learning [61.39541986848391]
Preference-based Reinforcement Learning (PbRL) is a paradigm in which an RL agent learns to optimize a task using pair-wise preference-based feedback over trajectories.
We propose a theoretical reward-agnostic PbRL framework where exploratory trajectories that enable accurate learning of hidden reward functions are acquired.
arXiv Detail & Related papers (2023-05-29T15:00:09Z) - Reinforcement Learning Applied to Trading Systems: A Survey [5.118560450410779]
The recent achievements and the notoriety of Reinforcement Learning have increased its adoption in trading tasks.
This review attempts to promote the development of this field of study by researchers' commitment to standards adherence.
arXiv Detail & Related papers (2022-11-01T21:26:12Z) - Reward Uncertainty for Exploration in Preference-based Reinforcement
Learning [88.34958680436552]
We present an exploration method specifically for preference-based reinforcement learning algorithms.
Our main idea is to design an intrinsic reward by measuring the novelty based on learned reward.
Our experiments show that exploration bonus from uncertainty in learned reward improves both feedback- and sample-efficiency of preference-based RL algorithms.
arXiv Detail & Related papers (2022-05-24T23:22:10Z) - Federated Reinforcement Learning: Techniques, Applications, and Open
Challenges [4.749929332500373]
Federated Reinforcement Learning (FRL) is an emerging and promising field in Reinforcement Learning (RL)
FRL algorithms can be divided into two categories, i.e. Horizontal Federated Reinforcement Learning (HFRL) and Vertical Federated Reinforcement Learning (VFRL)
arXiv Detail & Related papers (2021-08-26T16:22:49Z) - Information Directed Reward Learning for Reinforcement Learning [64.33774245655401]
We learn a model of the reward function that allows standard RL algorithms to achieve high expected return with as few expert queries as possible.
In contrast to prior active reward learning methods designed for specific types of queries, IDRL naturally accommodates different query types.
We support our findings with extensive evaluations in multiple environments and with different types of queries.
arXiv Detail & Related papers (2021-02-24T18:46:42Z) - What Matters In On-Policy Reinforcement Learning? A Large-Scale
Empirical Study [50.79125250286453]
On-policy reinforcement learning (RL) has been successfully applied to many different continuous control tasks.
But state-of-the-art implementations take numerous low- and high-level design decisions that strongly affect the performance of the resulting agents.
These choices are usually not extensively discussed in the literature, leading to discrepancy between published descriptions of algorithms and their implementations.
We implement >50 such choices'' in a unified on-policy RL framework, allowing us to investigate their impact in a large-scale empirical study.
arXiv Detail & Related papers (2020-06-10T17:59:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.