Automatic Intrinsic Reward Shaping for Exploration in Deep Reinforcement
Learning
- URL: http://arxiv.org/abs/2301.10886v5
- Date: Thu, 12 Oct 2023 02:30:33 GMT
- Title: Automatic Intrinsic Reward Shaping for Exploration in Deep Reinforcement
Learning
- Authors: Mingqi Yuan, Bo Li, Xin Jin, Wenjun Zeng
- Abstract summary: We present AIRS: Automatic Intrinsic Reward Shaping that intelligently and adaptively provides high-quality intrinsic rewards to enhance exploration in reinforcement learning (RL)
We develop an intrinsic reward toolkit to provide efficient and reliable implementations of diverse intrinsic reward approaches.
- Score: 55.2080971216584
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present AIRS: Automatic Intrinsic Reward Shaping that intelligently and
adaptively provides high-quality intrinsic rewards to enhance exploration in
reinforcement learning (RL). More specifically, AIRS selects shaping function
from a predefined set based on the estimated task return in real-time,
providing reliable exploration incentives and alleviating the biased objective
problem. Moreover, we develop an intrinsic reward toolkit to provide efficient
and reliable implementations of diverse intrinsic reward approaches. We test
AIRS on various tasks of MiniGrid, Procgen, and DeepMind Control Suite.
Extensive simulation demonstrates that AIRS can outperform the benchmarking
schemes and achieve superior performance with simple architecture.
Related papers
- Deep Reinforcement Learning with Hybrid Intrinsic Reward Model [50.53705050673944]
Intrinsic reward shaping has emerged as a prevalent approach to solving hard-exploration and sparse-rewards environments.
We introduce HIRE (Hybrid Intrinsic REward), a framework for creating hybrid intrinsic rewards through deliberate fusion strategies.
arXiv Detail & Related papers (2025-01-22T04:22:13Z) - MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization [91.80034860399677]
Reinforcement learning algorithms aim to balance exploiting the current best strategy with exploring new options that could lead to higher rewards.
We introduce a framework, MaxInfoRL, for balancing intrinsic and extrinsic exploration.
We show that our approach achieves sublinear regret in the simplified setting of multi-armed bandits.
arXiv Detail & Related papers (2024-12-16T18:59:53Z) - Adaptive Reward Design for Reinforcement Learning in Complex Robotic Tasks [2.3031174164121127]
We propose a suite of reward functions that incentivize an RL agent to make measurable progress on tasks specified by formulas.
We develop an adaptive reward shaping approach that dynamically updates these reward functions during the learning process.
Experimental results on a range of RL-based robotic tasks demonstrate that the proposed approach is compatible with various RL algorithms.
arXiv Detail & Related papers (2024-12-14T18:04:18Z) - ORSO: Accelerating Reward Design via Online Reward Selection and Policy Optimization [41.074747242532695]
Online Reward Selection and Policy Optimization (ORSO) is a novel approach that frames shaping reward selection as an online model selection problem.
ORSO employs principled exploration strategies to automatically identify promising shaping reward functions without human intervention.
We demonstrate ORSO's effectiveness across various continuous control tasks using the Isaac Gym simulator.
arXiv Detail & Related papers (2024-10-17T17:55:05Z) - RILe: Reinforced Imitation Learning [60.63173816209543]
RILe is a framework that combines the strengths of imitation learning and inverse reinforcement learning to learn a dense reward function efficiently.
Our framework produces high-performing policies in high-dimensional tasks where direct imitation fails to replicate complex behaviors.
arXiv Detail & Related papers (2024-06-12T17:56:31Z) - Regularity as Intrinsic Reward for Free Play [24.29379265146469]
We propose regularity as a novel reward signal for intrinsically-motivated reinforcement learning.
Our generalized formulation of Regularity as Intrinsic Reward (RaIR) allows us to operationalize it within model-based reinforcement learning.
arXiv Detail & Related papers (2023-12-03T18:18:44Z) - Self-Refined Large Language Model as Automated Reward Function Designer
for Deep Reinforcement Learning in Robotics [14.773498542408264]
Large Language Models (LLMs) have been extensively adopted to address tasks demanding in-depth common-sense knowledge.
We propose in this work a novel LLM framework with a self-refinement mechanism for automated reward function design.
arXiv Detail & Related papers (2023-09-13T02:56:56Z) - Unpacking Reward Shaping: Understanding the Benefits of Reward
Engineering on Sample Complexity [114.88145406445483]
Reinforcement learning provides an automated framework for learning behaviors from high-level reward specifications.
In practice the choice of reward function can be crucial for good results.
arXiv Detail & Related papers (2022-10-18T04:21:25Z) - Multimodal Reward Shaping for Efficient Exploration in Reinforcement
Learning [8.810296389358134]
IRS modules rely on attendant models or additional memory to record and analyze learning procedures.
We introduce a novel metric entitled Jain's fairness index (JFI) to replace the entropy regularizer.
arXiv Detail & Related papers (2021-07-19T14:04:32Z) - Soft Expert Reward Learning for Vision-and-Language Navigation [94.86954695912125]
Vision-and-Language Navigation (VLN) requires an agent to find a specified spot in an unseen environment by following natural language instructions.
We introduce a Soft Expert Reward Learning (SERL) model to overcome the reward engineering designing and generalisation problems of the VLN task.
arXiv Detail & Related papers (2020-07-21T14:17:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.