Stage-Wise Reward Shaping for Acrobatic Robots: A Constrained Multi-Objective Reinforcement Learning Approach
- URL: http://arxiv.org/abs/2409.15755v1
- Date: Tue, 24 Sep 2024 05:25:24 GMT
- Title: Stage-Wise Reward Shaping for Acrobatic Robots: A Constrained Multi-Objective Reinforcement Learning Approach
- Authors: Dohyeong Kim, Hyeokjin Kwon, Junseok Kim, Gunmin Lee, Songhwai Oh,
- Abstract summary: We introduce an RL method aimed at simplifying the reward-shaping process through intuitive strategies.
We define multiple reward and cost functions within a constrained multi-objective RL (CMORL) framework.
For tasks involving sequential complex movements, we segment the task into distinct stages and define multiple rewards and costs for each stage.
- Score: 12.132416927711036
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As the complexity of tasks addressed through reinforcement learning (RL) increases, the definition of reward functions also has become highly complicated. We introduce an RL method aimed at simplifying the reward-shaping process through intuitive strategies. Initially, instead of a single reward function composed of various terms, we define multiple reward and cost functions within a constrained multi-objective RL (CMORL) framework. For tasks involving sequential complex movements, we segment the task into distinct stages and define multiple rewards and costs for each stage. Finally, we introduce a practical CMORL algorithm that maximizes objectives based on these rewards while satisfying constraints defined by the costs. The proposed method has been successfully demonstrated across a variety of acrobatic tasks in both simulation and real-world environments. Additionally, it has been shown to successfully perform tasks compared to existing RL and constrained RL algorithms. Our code is available at https://github.com/rllab-snu/Stage-Wise-CMORL.
Related papers
- MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization [91.80034860399677]
Reinforcement learning algorithms aim to balance exploiting the current best strategy with exploring new options that could lead to higher rewards.
We introduce a framework, MaxInfoRL, for balancing intrinsic and extrinsic exploration.
We show that our approach achieves sublinear regret in the simplified setting of multi-armed bandits.
arXiv Detail & Related papers (2024-12-16T18:59:53Z) - Adaptive Reward Design for Reinforcement Learning in Complex Robotic Tasks [2.3031174164121127]
We propose a suite of reward functions that incentivize an RL agent to make measurable progress on tasks specified by formulas.
We develop an adaptive reward shaping approach that dynamically updates these reward functions during the learning process.
Experimental results on a range of RL-based robotic tasks demonstrate that the proposed approach is compatible with various RL algorithms.
arXiv Detail & Related papers (2024-12-14T18:04:18Z) - Guiding Multi-agent Multi-task Reinforcement Learning by a Hierarchical Framework with Logical Reward Shaping [16.5526277899717]
This study aims to design a multi-agent cooperative algorithm with logic reward shaping.
Experiments have been conducted on various types of tasks in the Minecraft-like environment.
arXiv Detail & Related papers (2024-11-02T09:03:23Z) - Curriculum Reinforcement Learning for Complex Reward Functions [5.78463306498655]
We propose a two-stage reward curriculum that first maximizes a simple reward function and then transitions to the full, complex reward.
We evaluate our method on the DeepMind control suite, modified to include an additional constraint term in the reward definitions.
Our results demonstrate the potential of two-stage reward curricula for efficient and stable RL in environments with complex rewards.
arXiv Detail & Related papers (2024-10-22T08:07:44Z) - Automated Rewards via LLM-Generated Progress Functions [47.50772243693897]
Large Language Models (LLMs) have the potential to automate reward engineering by leveraging their broad domain knowledge across various tasks.
This paper introduces an LLM-driven reward generation framework that is able to produce state-of-the-art policies on the challenging Bi-DexHands benchmark.
arXiv Detail & Related papers (2024-10-11T18:41:15Z) - ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL [80.10358123795946]
We develop a framework for building multi-turn RL algorithms for fine-tuning large language models.
Our framework adopts a hierarchical RL approach and runs two RL algorithms in parallel.
Empirically, we find that ArCHer significantly improves efficiency and performance on agent tasks.
arXiv Detail & Related papers (2024-02-29T18:45:56Z) - Meta Reinforcement Learning with Successor Feature Based Context [51.35452583759734]
We propose a novel meta-RL approach that achieves competitive performance comparing to existing meta-RL algorithms.
Our method does not only learn high-quality policies for multiple tasks simultaneously but also can quickly adapt to new tasks with a small amount of training.
arXiv Detail & Related papers (2022-07-29T14:52:47Z) - Reinforcement Learning Agent Training with Goals for Real World Tasks [3.747737951407512]
Reinforcement Learning (RL) is a promising approach for solving various control, optimization, and sequential decision making tasks.
We propose a specification language (Inkling Goal Specification) for complex control and optimization tasks.
We include a set of experiments showing that the proposed method provides great ease of use to specify a wide range of real world tasks.
arXiv Detail & Related papers (2021-07-21T23:21:16Z) - Provable Multi-Objective Reinforcement Learning with Generative Models [98.19879408649848]
We study the problem of single policy MORL, which learns an optimal policy given the preference of objectives.
Existing methods require strong assumptions such as exact knowledge of the multi-objective decision process.
We propose a new algorithm called model-based envelop value (EVI) which generalizes the enveloped multi-objective $Q$-learning algorithm.
arXiv Detail & Related papers (2020-11-19T22:35:31Z) - Active Finite Reward Automaton Inference and Reinforcement Learning
Using Queries and Counterexamples [31.31937554018045]
Deep reinforcement learning (RL) methods require intensive data from the exploration of the environment to achieve satisfactory performance.
We propose a framework that enables an RL agent to reason over its exploration process and distill high-level knowledge for effectively guiding its future explorations.
Specifically, we propose a novel RL algorithm that learns high-level knowledge in the form of a finite reward automaton by using the L* learning algorithm.
arXiv Detail & Related papers (2020-06-28T21:13:08Z) - On Reward-Free Reinforcement Learning with Linear Function Approximation [144.4210285338698]
Reward-free reinforcement learning (RL) is a framework which is suitable for both the batch RL setting and the setting where there are many reward functions of interest.
In this work, we give both positive and negative results for reward-free RL with linear function approximation.
arXiv Detail & Related papers (2020-06-19T17:59:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.