Highly Efficient Self-Adaptive Reward Shaping for Reinforcement Learning
- URL: http://arxiv.org/abs/2408.03029v3
- Date: Mon, 14 Oct 2024 02:25:19 GMT
- Title: Highly Efficient Self-Adaptive Reward Shaping for Reinforcement Learning
- Authors: Haozhe Ma, Zhengding Luo, Thanh Vinh Vo, Kuankuan Sima, Tze-Yun Leong,
- Abstract summary: Reward shaping is a technique in reinforcement learning that addresses the sparse-reward problem by providing more frequent and informative rewards.
We introduce a self-adaptive and highly efficient reward shaping mechanism that incorporates success rates derived from historical experiences as shaped rewards.
Our method is validated on various tasks with extremely sparse rewards, demonstrating notable improvements in sample efficiency and convergence stability over relevant baselines.
- Score: 5.242869847419834
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reward shaping is a technique in reinforcement learning that addresses the sparse-reward problem by providing more frequent and informative rewards. We introduce a self-adaptive and highly efficient reward shaping mechanism that incorporates success rates derived from historical experiences as shaped rewards. The success rates are sampled from Beta distributions, which dynamically evolve from uncertain to reliable values as data accumulates. Initially, the shaped rewards exhibit more randomness to encourage exploration, while over time, the increasing certainty enhances exploitation, naturally balancing exploration and exploitation. Our approach employs Kernel Density Estimation (KDE) combined with Random Fourier Features (RFF) to derive the Beta distributions, providing a computationally efficient, non-parametric, and learning-free solution for high-dimensional continuous state spaces. Our method is validated on various tasks with extremely sparse rewards, demonstrating notable improvements in sample efficiency and convergence stability over relevant baselines.
Related papers
- Improving the Effectiveness of Potential-Based Reward Shaping in Reinforcement Learning [0.5524804393257919]
We show how a simple linear shift of the potential function can be used to improve the effectiveness of reward shaping.
We show the theoretical limitations of continuous potential functions for correctly assigning positive and negative reward shaping values.
arXiv Detail & Related papers (2025-02-03T12:32:50Z) - Deep Reinforcement Learning with Hybrid Intrinsic Reward Model [50.53705050673944]
Intrinsic reward shaping has emerged as a prevalent approach to solving hard-exploration and sparse-rewards environments.
We introduce HIRE (Hybrid Intrinsic REward), a framework for creating hybrid intrinsic rewards through deliberate fusion strategies.
arXiv Detail & Related papers (2025-01-22T04:22:13Z) - Learning Off-policy with Model-based Intrinsic Motivation For Active Online Exploration [15.463313629574111]
This paper investigates how to achieve sample-efficient exploration in continuous control tasks.
We introduce an RL algorithm that incorporates a predictive model and off-policy learning elements.
We derive an intrinsic reward without incurring parameters overhead.
arXiv Detail & Related papers (2024-03-31T11:39:11Z) - Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels [57.46832672991433]
We propose a novel equation discovery method based on Kernel learning and BAyesian Spike-and-Slab priors (KBASS)
We use kernel regression to estimate the target function, which is flexible, expressive, and more robust to data sparsity and noises.
We develop an expectation-propagation expectation-maximization algorithm for efficient posterior inference and function estimation.
arXiv Detail & Related papers (2023-10-09T03:55:09Z) - Distributional Reward Estimation for Effective Multi-Agent Deep
Reinforcement Learning [19.788336796981685]
We propose a novel Distributional Reward Estimation framework for effective Multi-Agent Reinforcement Learning (DRE-MARL)
Our main idea is to design the multi-action-branch reward estimation and policy-weighted reward aggregation for stabilized training.
The superiority of the DRE-MARL is demonstrated using benchmark multi-agent scenarios, compared with the SOTA baselines in terms of both effectiveness and robustness.
arXiv Detail & Related papers (2022-10-14T08:31:45Z) - Basis for Intentions: Efficient Inverse Reinforcement Learning using
Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior.
This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z) - SURF: Semi-supervised Reward Learning with Data Augmentation for
Feedback-efficient Preference-based Reinforcement Learning [168.89470249446023]
We present SURF, a semi-supervised reward learning framework that utilizes a large amount of unlabeled samples with data augmentation.
In order to leverage unlabeled samples for reward learning, we infer pseudo-labels of the unlabeled samples based on the confidence of the preference predictor.
Our experiments demonstrate that our approach significantly improves the feedback-efficiency of the preference-based method on a variety of locomotion and robotic manipulation tasks.
arXiv Detail & Related papers (2022-03-18T16:50:38Z) - Efficient Exploration of Reward Functions in Inverse Reinforcement
Learning via Bayesian Optimization [43.51553742077343]
inverse reinforcement learning (IRL) is relevant to a variety of tasks including value alignment and robot learning from demonstration.
This paper presents an IRL framework called Bayesian optimization-IRL (BO-IRL) which identifies multiple solutions consistent with the expert demonstrations.
arXiv Detail & Related papers (2020-11-17T10:17:45Z) - f-IRL: Inverse Reinforcement Learning via State Marginal Matching [13.100127636586317]
We propose a method for learning the reward function (and the corresponding policy) to match the expert state density.
We present an algorithm, f-IRL, that recovers a stationary reward function from the expert density by gradient descent.
Our method outperforms adversarial imitation learning methods in terms of sample efficiency and the required number of expert trajectories.
arXiv Detail & Related papers (2020-11-09T19:37:48Z) - Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping [71.214923471669]
Reward shaping is an effective technique for incorporating domain knowledge into reinforcement learning (RL)
In this paper, we consider the problem of adaptively utilizing a given shaping reward function.
Experiments in sparse-reward cartpole and MuJoCo environments show that our algorithms can fully exploit beneficial shaping rewards.
arXiv Detail & Related papers (2020-11-05T05:34:14Z) - Provably Efficient Reward-Agnostic Navigation with Linear Value
Iteration [143.43658264904863]
We show how iteration under a more standard notion of low inherent Bellman error, typically employed in least-square value-style algorithms, can provide strong PAC guarantees on learning a near optimal value function.
We present a computationally tractable algorithm for the reward-free setting and show how it can be used to learn a near optimal policy for any (linear) reward function.
arXiv Detail & Related papers (2020-08-18T04:34:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.