Automatic Intrinsic Reward Shaping for Exploration in Deep Reinforcement
Learning
- URL: http://arxiv.org/abs/2301.10886v5
- Date: Thu, 12 Oct 2023 02:30:33 GMT
- Title: Automatic Intrinsic Reward Shaping for Exploration in Deep Reinforcement
Learning
- Authors: Mingqi Yuan, Bo Li, Xin Jin, Wenjun Zeng
- Abstract summary: We present AIRS: Automatic Intrinsic Reward Shaping that intelligently and adaptively provides high-quality intrinsic rewards to enhance exploration in reinforcement learning (RL)
We develop an intrinsic reward toolkit to provide efficient and reliable implementations of diverse intrinsic reward approaches.
- Score: 55.2080971216584
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present AIRS: Automatic Intrinsic Reward Shaping that intelligently and
adaptively provides high-quality intrinsic rewards to enhance exploration in
reinforcement learning (RL). More specifically, AIRS selects shaping function
from a predefined set based on the estimated task return in real-time,
providing reliable exploration incentives and alleviating the biased objective
problem. Moreover, we develop an intrinsic reward toolkit to provide efficient
and reliable implementations of diverse intrinsic reward approaches. We test
AIRS on various tasks of MiniGrid, Procgen, and DeepMind Control Suite.
Extensive simulation demonstrates that AIRS can outperform the benchmarking
schemes and achieve superior performance with simple architecture.
Related papers
- MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization [91.80034860399677]
Reinforcement learning algorithms aim to balance exploiting the current best strategy with exploring new options that could lead to higher rewards.
We introduce a framework, MaxInfoRL, for balancing intrinsic and extrinsic exploration.
We show that our approach achieves sublinear regret in the simplified setting of multi-armed bandits.
arXiv Detail & Related papers (2024-12-16T18:59:53Z) - Adaptive Reward Design for Reinforcement Learning in Complex Robotic Tasks [2.3031174164121127]
We propose a suite of reward functions that incentivize an RL agent to make measurable progress on tasks specified by formulas.
We develop an adaptive reward shaping approach that dynamically updates these reward functions during the learning process.
Experimental results on a range of RL-based robotic tasks demonstrate that the proposed approach is compatible with various RL algorithms.
arXiv Detail & Related papers (2024-12-14T18:04:18Z) - Sample-Efficient Curriculum Reinforcement Learning for Complex Reward Functions [5.78463306498655]
Reinforcement learning (RL) shows promise in control problems, but its practical application is often hindered by the complexity arising from intricate reward functions with constraints.
We propose a novel two-stage reward curriculum combined with a flexible replay buffer that adaptively samples experiences.
Our approach first learns on a subset of rewards before transitioning to the full reward, allowing the agent to learn trade-offs between objectives and constraints.
arXiv Detail & Related papers (2024-10-22T08:07:44Z) - ORSO: Accelerating Reward Design via Online Reward Selection and Policy Optimization [41.074747242532695]
Online Reward Selection and Policy Optimization (ORSO) is a novel approach that frames shaping reward selection as an online model selection problem.
ORSO employs principled exploration strategies to automatically identify promising shaping reward functions without human intervention.
We demonstrate ORSO's effectiveness across various continuous control tasks using the Isaac Gym simulator.
arXiv Detail & Related papers (2024-10-17T17:55:05Z) - RILe: Reinforced Imitation Learning [60.63173816209543]
RILe (Reinforced Learning) is a framework that combines the strengths of imitation learning and inverse reinforcement learning to learn a dense reward function efficiently.
Our framework produces high-performing policies in high-dimensional tasks where direct imitation fails to replicate complex behaviors.
arXiv Detail & Related papers (2024-06-12T17:56:31Z) - RLeXplore: Accelerating Research in Intrinsically-Motivated Reinforcement Learning [50.55776190278426]
Extrinsic rewards can effectively guide reinforcement learning (RL) agents in specific tasks.
We introduce RLeXplore, a unified, highly modularized, and plug-and-play framework offering reliable implementations of eight state-of-the-art intrinsic reward algorithms.
arXiv Detail & Related papers (2024-05-29T22:23:20Z) - Regularity as Intrinsic Reward for Free Play [24.29379265146469]
We propose regularity as a novel reward signal for intrinsically-motivated reinforcement learning.
Our generalized formulation of Regularity as Intrinsic Reward (RaIR) allows us to operationalize it within model-based reinforcement learning.
arXiv Detail & Related papers (2023-12-03T18:18:44Z) - Self-Refined Large Language Model as Automated Reward Function Designer
for Deep Reinforcement Learning in Robotics [14.773498542408264]
Large Language Models (LLMs) have been extensively adopted to address tasks demanding in-depth common-sense knowledge.
We propose in this work a novel LLM framework with a self-refinement mechanism for automated reward function design.
arXiv Detail & Related papers (2023-09-13T02:56:56Z) - Self-supervised network distillation: an effective approach to exploration in sparse reward environments [0.0]
Reinforcement learning can train an agent to behave in an environment according to a predesigned reward function.
The solution to such a problem may be to equip the agent with an intrinsic motivation that will provide informed exploration.
We present Self-supervised Network Distillation (SND), a class of intrinsic motivation algorithms based on the distillation error as a novelty indicator.
arXiv Detail & Related papers (2023-02-22T18:58:09Z) - Unpacking Reward Shaping: Understanding the Benefits of Reward
Engineering on Sample Complexity [114.88145406445483]
Reinforcement learning provides an automated framework for learning behaviors from high-level reward specifications.
In practice the choice of reward function can be crucial for good results.
arXiv Detail & Related papers (2022-10-18T04:21:25Z) - Dealing with Sparse Rewards in Continuous Control Robotics via
Heavy-Tailed Policies [64.2210390071609]
We present a novel Heavy-Tailed Policy Gradient (HT-PSG) algorithm to deal with the challenges of sparse rewards in continuous control problems.
We show consistent performance improvement across all tasks in terms of high average cumulative reward.
arXiv Detail & Related papers (2022-06-12T04:09:39Z) - Multitask Adaptation by Retrospective Exploration with Learned World
Models [77.34726150561087]
We propose a meta-learned addressing model called RAMa that provides training samples for the MBRL agent taken from task-agnostic storage.
The model is trained to maximize the expected agent's performance by selecting promising trajectories solving prior tasks from the storage.
arXiv Detail & Related papers (2021-10-25T20:02:57Z) - Multimodal Reward Shaping for Efficient Exploration in Reinforcement
Learning [8.810296389358134]
IRS modules rely on attendant models or additional memory to record and analyze learning procedures.
We introduce a novel metric entitled Jain's fairness index (JFI) to replace the entropy regularizer.
arXiv Detail & Related papers (2021-07-19T14:04:32Z) - Soft Expert Reward Learning for Vision-and-Language Navigation [94.86954695912125]
Vision-and-Language Navigation (VLN) requires an agent to find a specified spot in an unseen environment by following natural language instructions.
We introduce a Soft Expert Reward Learning (SERL) model to overcome the reward engineering designing and generalisation problems of the VLN task.
arXiv Detail & Related papers (2020-07-21T14:17:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.