Sample-Efficient Curriculum Reinforcement Learning for Complex Reward Functions
- URL: http://arxiv.org/abs/2410.16790v1
- Date: Tue, 22 Oct 2024 08:07:44 GMT
- Title: Sample-Efficient Curriculum Reinforcement Learning for Complex Reward Functions
- Authors: Kilian Freitag, Kristian Ceder, Rita Laezza, Knut Ã…kesson, Morteza Haghir Chehreghani,
- Abstract summary: Reinforcement learning (RL) shows promise in control problems, but its practical application is often hindered by the complexity arising from intricate reward functions with constraints.
We propose a novel two-stage reward curriculum combined with a flexible replay buffer that adaptively samples experiences.
Our approach first learns on a subset of rewards before transitioning to the full reward, allowing the agent to learn trade-offs between objectives and constraints.
- Score: 5.78463306498655
- License:
- Abstract: Reinforcement learning (RL) shows promise in control problems, but its practical application is often hindered by the complexity arising from intricate reward functions with constraints. While the reward hypothesis suggests these competing demands can be encapsulated in a single scalar reward function, designing such functions remains challenging. Building on existing work, we start by formulating preferences over trajectories to derive a realistic reward function that balances goal achievement with constraint satisfaction in the application of mobile robotics with dynamic obstacles. To mitigate reward exploitation in such complex settings, we propose a novel two-stage reward curriculum combined with a flexible replay buffer that adaptively samples experiences. Our approach first learns on a subset of rewards before transitioning to the full reward, allowing the agent to learn trade-offs between objectives and constraints. After transitioning to a new stage, our method continues to make use of past experiences by updating their rewards for sample-efficient learning. We investigate the efficacy of our approach in robot navigation tasks and demonstrate superior performance compared to baselines in terms of true reward achievement and task completion, underlining its effectiveness.
Related papers
- Adaptive Reward Design for Reinforcement Learning in Complex Robotic Tasks [2.3031174164121127]
We propose a suite of reward functions that incentivize an RL agent to make measurable progress on tasks specified by formulas.
We develop an adaptive reward shaping approach that dynamically updates these reward functions during the learning process.
Experimental results on a range of RL-based robotic tasks demonstrate that the proposed approach is compatible with various RL algorithms.
arXiv Detail & Related papers (2024-12-14T18:04:18Z) - Stage-Wise Reward Shaping for Acrobatic Robots: A Constrained Multi-Objective Reinforcement Learning Approach [12.132416927711036]
We introduce an RL method aimed at simplifying the reward-shaping process through intuitive strategies.
We define multiple reward and cost functions within a constrained multi-objective RL (CMORL) framework.
For tasks involving sequential complex movements, we segment the task into distinct stages and define multiple rewards and costs for each stage.
arXiv Detail & Related papers (2024-09-24T05:25:24Z) - Constrained Reinforcement Learning with Smoothed Log Barrier Function [27.216122901635018]
We propose a new constrained RL method called CSAC-LB (Constrained Soft Actor-Critic with Log Barrier Function)
It achieves competitive performance without any pre-training by applying a linear smoothed log barrier function to an additional safety critic.
We show that with CSAC-LB, we achieve state-of-the-art performance on several constrained control tasks with different levels of difficulty.
arXiv Detail & Related papers (2024-03-21T16:02:52Z) - Dense Reward for Free in Reinforcement Learning from Human Feedback [64.92448888346125]
We leverage the fact that the reward model contains more information than just its scalar output.
We use these attention weights to redistribute the reward along the whole completion.
Empirically, we show that it stabilises training, accelerates the rate of learning, and, in practical cases, may lead to better local optima.
arXiv Detail & Related papers (2024-02-01T17:10:35Z) - REBEL: Reward Regularization-Based Approach for Robotic Reinforcement Learning from Human Feedback [61.54791065013767]
A misalignment between the reward function and human preferences can lead to catastrophic outcomes in the real world.
Recent methods aim to mitigate misalignment by learning reward functions from human preferences.
We propose a novel concept of reward regularization within the robotic RLHF framework.
arXiv Detail & Related papers (2023-12-22T04:56:37Z) - Actively Learning Costly Reward Functions for Reinforcement Learning [56.34005280792013]
We show that it is possible to train agents in complex real-world environments orders of magnitudes faster.
By enabling the application of reinforcement learning methods to new domains, we show that we can find interesting and non-trivial solutions.
arXiv Detail & Related papers (2022-11-23T19:17:20Z) - Skill-Based Reinforcement Learning with Intrinsic Reward Matching [77.34726150561087]
We present Intrinsic Reward Matching (IRM), which unifies task-agnostic skill pretraining and task-aware finetuning.
IRM enables us to utilize pretrained skills far more effectively than previous skill selection methods.
arXiv Detail & Related papers (2022-10-14T00:04:49Z) - Hindsight Reward Tweaking via Conditional Deep Reinforcement Learning [37.61951923445689]
We propose a novel paradigm for deep reinforcement learning to model the influences of reward functions within a near-optimal space.
We demonstrate the feasibility of this approach and study one of its potential application in policy performance boosting with multiple MuJoCo tasks.
arXiv Detail & Related papers (2021-09-06T10:06:48Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - Semi-supervised reward learning for offline reinforcement learning [71.6909757718301]
Training agents usually requires reward functions, but rewards are seldom available in practice and their engineering is challenging and laborious.
We propose semi-supervised learning algorithms that learn from limited annotations and incorporate unlabelled data.
In our experiments with a simulated robotic arm, we greatly improve upon behavioural cloning and closely approach the performance achieved with ground truth rewards.
arXiv Detail & Related papers (2020-12-12T20:06:15Z) - Self-Imitation Learning for Robot Tasks with Sparse and Delayed Rewards [1.2691047660244335]
We propose a practical self-imitation learning method named Self-Imitation Learning with Constant Reward (SILCR)
Our method assigns the immediate rewards at each timestep with constant values according to their final episodic rewards.
We demonstrate the effectiveness of our method in some challenging continuous robotics control tasks in MuJoCo simulation.
arXiv Detail & Related papers (2020-10-14T11:12:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.