Multimodal Reward Shaping for Efficient Exploration in Reinforcement
Learning
- URL: http://arxiv.org/abs/2107.08888v1
- Date: Mon, 19 Jul 2021 14:04:32 GMT
- Title: Multimodal Reward Shaping for Efficient Exploration in Reinforcement
Learning
- Authors: Mingqi Yuan, Mon-on Pun, Yi Chen, Dong Wang, Haojun Li
- Abstract summary: IRS modules rely on attendant models or additional memory to record and analyze learning procedures.
We introduce a novel metric entitled Jain's fairness index (JFI) to replace the entropy regularizer.
- Score: 8.810296389358134
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Maintaining long-term exploration ability remains one of the challenges of
deep reinforcement learning (DRL). In practice, the reward shaping-based
approaches are leveraged to provide intrinsic rewards for the agent to
incentivize motivation. However, most existing IRS modules rely on attendant
models or additional memory to record and analyze learning procedures, which
leads to high computational complexity and low robustness. Moreover, they
overemphasize the influence of a single state on exploration, which cannot
evaluate the exploration performance from a global perspective. To tackle the
problem, state entropy-based methods are proposed to encourage the agent to
visit the state space more equitably. However, the estimation error and sample
complexity are prohibitive when handling environments with high-dimensional
observation. In this paper, we introduce a novel metric entitled Jain's
fairness index (JFI) to replace the entropy regularizer, which requires no
additional models or memory. In particular, JFI overcomes the vanishing
intrinsic rewards problem and can be generalized into arbitrary tasks.
Furthermore, we use a variational auto-encoder (VAE) model to capture the
life-long novelty of states. Finally, the global JFI score and local state
novelty are combined to form a multimodal intrinsic reward, controlling the
exploration extent more precisely. Finally, extensive simulation results
demonstrate that our multimodal reward shaping (MMRS) method can achieve higher
performance in contrast to other benchmark schemes.
Related papers
- Learning Off-policy with Model-based Intrinsic Motivation For Active Online Exploration [15.463313629574111]
This paper investigates how to achieve sample-efficient exploration in continuous control tasks.
We introduce an RL algorithm that incorporates a predictive model and off-policy learning elements.
We derive an intrinsic reward without incurring parameters overhead.
arXiv Detail & Related papers (2024-03-31T11:39:11Z) - Self-supervised network distillation: an effective approach to exploration in sparse reward environments [0.0]
Reinforcement learning can train an agent to behave in an environment according to a predesigned reward function.
The solution to such a problem may be to equip the agent with an intrinsic motivation that will provide informed exploration.
We present Self-supervised Network Distillation (SND), a class of intrinsic motivation algorithms based on the distillation error as a novelty indicator.
arXiv Detail & Related papers (2023-02-22T18:58:09Z) - When Demonstrations Meet Generative World Models: A Maximum Likelihood
Framework for Offline Inverse Reinforcement Learning [62.00672284480755]
This paper aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent.
Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving.
arXiv Detail & Related papers (2023-02-15T04:14:20Z) - Automatic Intrinsic Reward Shaping for Exploration in Deep Reinforcement
Learning [55.2080971216584]
We present AIRS: Automatic Intrinsic Reward Shaping that intelligently and adaptively provides high-quality intrinsic rewards to enhance exploration in reinforcement learning (RL)
We develop an intrinsic reward toolkit to provide efficient and reliable implementations of diverse intrinsic reward approaches.
arXiv Detail & Related papers (2023-01-26T01:06:46Z) - Rewarding Episodic Visitation Discrepancy for Exploration in
Reinforcement Learning [64.8463574294237]
We propose Rewarding Episodic Visitation Discrepancy (REVD) as an efficient and quantified exploration method.
REVD provides intrinsic rewards by evaluating the R'enyi divergence-based visitation discrepancy between episodes.
It is tested on PyBullet Robotics Environments and Atari games.
arXiv Detail & Related papers (2022-09-19T08:42:46Z) - R\'enyi State Entropy for Exploration Acceleration in Reinforcement
Learning [6.72733760405596]
In this work, a novel intrinsic reward module based on the R'enyi entropy is proposed to provide high-quality intrinsic rewards.
In particular, a $k$-nearest neighbor is introduced for entropy estimation while a $k$-value search method is designed to guarantee the estimation accuracy.
arXiv Detail & Related papers (2022-03-08T07:38:35Z) - On Reward-Free RL with Kernel and Neural Function Approximations:
Single-Agent MDP and Markov Game [140.19656665344917]
We study the reward-free RL problem, where an agent aims to thoroughly explore the environment without any pre-specified reward function.
We tackle this problem under the context of function approximation, leveraging powerful function approximators.
We establish the first provably efficient reward-free RL algorithm with kernel and neural function approximators.
arXiv Detail & Related papers (2021-10-19T07:26:33Z) - MADE: Exploration via Maximizing Deviation from Explored Regions [48.49228309729319]
In online reinforcement learning (RL), efficient exploration remains challenging in high-dimensional environments with sparse rewards.
We propose a new exploration approach via textitmaximizing the deviation of the occupancy of the next policy from the explored regions.
Our approach significantly improves sample efficiency over state-of-the-art methods.
arXiv Detail & Related papers (2021-06-18T17:57:00Z) - Online reinforcement learning with sparse rewards through an active
inference capsule [62.997667081978825]
This paper introduces an active inference agent which minimizes the novel free energy of the expected future.
Our model is capable of solving sparse-reward problems with a very high sample efficiency.
We also introduce a novel method for approximating the prior model from the reward function, which simplifies the expression of complex objectives.
arXiv Detail & Related papers (2021-06-04T10:03:36Z) - RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated
Environments [15.736899098702972]
We propose a novel type of intrinsic reward which encourages the agent to take actions that lead to significant changes in its learned state representation.
We evaluate our method on multiple challenging procedurally-generated tasks in MiniGrid.
arXiv Detail & Related papers (2020-02-27T18:03:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.