Nuclear Norm Maximization Based Curiosity-Driven Learning
- URL: http://arxiv.org/abs/2205.10484v1
- Date: Sat, 21 May 2022 01:52:47 GMT
- Title: Nuclear Norm Maximization Based Curiosity-Driven Learning
- Authors: Chao Chen, Zijian Gao, Kele Xu, Sen Yang, Yiying Li, Bo Ding, Dawei
Feng, Huaimin Wang
- Abstract summary: We propose a novel curiosity leveraging the nuclear norm (NNM)
On 26 Atari games, NNM achieves a human-normalized score of 1.09, which doubles that of competitive intrinsic rewards-based approaches.
- Score: 22.346209746751818
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To handle the sparsity of the extrinsic rewards in reinforcement learning,
researchers have proposed intrinsic reward which enables the agent to learn the
skills that might come in handy for pursuing the rewards in the future, such as
encouraging the agent to visit novel states. However, the intrinsic reward can
be noisy due to the undesirable environment's stochasticity and directly
applying the noisy value predictions to supervise the policy is detrimental to
improve the learning performance and efficiency. Moreover, many previous
studies employ $\ell^2$ norm or variance to measure the exploration novelty,
which will amplify the noise due to the square operation. In this paper, we
address aforementioned challenges by proposing a novel curiosity leveraging the
nuclear norm maximization (NNM), which can quantify the novelty of exploring
the environment more accurately while providing high-tolerance to the noise and
outliers. We conduct extensive experiments across a variety of benchmark
environments and the results suggest that NNM can provide state-of-the-art
performance compared with previous curiosity methods. On 26 Atari games subset,
NNM achieves a human-normalized score of 1.09, which doubles that of
competitive intrinsic rewards-based approaches. Our code will be released
publicly to enhance the reproducibility.
Related papers
- The Dark Side of Rich Rewards: Understanding and Mitigating Noise in VLM Rewards [34.636688162807836]
Vision-Language Models (VLMs) are increasingly used to generate reward signals for training embodied agents.
Our research reveals that agents guided by VLM rewards often underperform compared to those employing only intrinsic rewards.
We introduce BiMI, a novel reward function designed to mitigate noise.
arXiv Detail & Related papers (2024-09-24T09:45:20Z) - Random Latent Exploration for Deep Reinforcement Learning [71.88709402926415]
This paper introduces a new exploration technique called Random Latent Exploration (RLE)
RLE combines the strengths of bonus-based and noise-based (two popular approaches for effective exploration in deep RL) exploration strategies.
We evaluate it on the challenging Atari and IsaacGym benchmarks and show that RLE exhibits higher overall scores across all the tasks than other approaches.
arXiv Detail & Related papers (2024-07-18T17:55:22Z) - Never Explore Repeatedly in Multi-Agent Reinforcement Learning [40.35950679063337]
We propose a dynamic reward scaling approach to combat "revisitation"
We show enhanced performance in demanding environments like Google Research Football and StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2023-08-19T05:27:48Z) - DEIR: Efficient and Robust Exploration through
Discriminative-Model-Based Episodic Intrinsic Rewards [2.09711130126031]
Exploration is a fundamental aspect of reinforcement learning (RL), and its effectiveness is a deciding factor in the performance of RL algorithms.
Recent studies have shown the effectiveness of encouraging exploration with intrinsic rewards estimated from novelties in observations.
We propose DEIR, a novel method in which we theoretically derive an intrinsic reward with a conditional mutual information term.
arXiv Detail & Related papers (2023-04-21T06:39:38Z) - Self-supervised network distillation: an effective approach to exploration in sparse reward environments [0.0]
Reinforcement learning can train an agent to behave in an environment according to a predesigned reward function.
The solution to such a problem may be to equip the agent with an intrinsic motivation that will provide informed exploration.
We present Self-supervised Network Distillation (SND), a class of intrinsic motivation algorithms based on the distillation error as a novelty indicator.
arXiv Detail & Related papers (2023-02-22T18:58:09Z) - Rewarding Episodic Visitation Discrepancy for Exploration in
Reinforcement Learning [64.8463574294237]
We propose Rewarding Episodic Visitation Discrepancy (REVD) as an efficient and quantified exploration method.
REVD provides intrinsic rewards by evaluating the R'enyi divergence-based visitation discrepancy between episodes.
It is tested on PyBullet Robotics Environments and Atari games.
arXiv Detail & Related papers (2022-09-19T08:42:46Z) - Self-Supervised Exploration via Temporal Inconsistency in Reinforcement
Learning [17.360622968442982]
We present a novel intrinsic reward inspired by human learning, as humans evaluate curiosity by comparing current observations with historical knowledge.
Our method involves training a self-supervised prediction model, saving snapshots of the model parameters, and using nuclear norm to evaluate the temporal inconsistency between the predictions of different snapshots as intrinsic rewards.
arXiv Detail & Related papers (2022-08-24T08:19:41Z) - SURF: Semi-supervised Reward Learning with Data Augmentation for
Feedback-efficient Preference-based Reinforcement Learning [168.89470249446023]
We present SURF, a semi-supervised reward learning framework that utilizes a large amount of unlabeled samples with data augmentation.
In order to leverage unlabeled samples for reward learning, we infer pseudo-labels of the unlabeled samples based on the confidence of the preference predictor.
Our experiments demonstrate that our approach significantly improves the feedback-efficiency of the preference-based method on a variety of locomotion and robotic manipulation tasks.
arXiv Detail & Related papers (2022-03-18T16:50:38Z) - Anti-Concentrated Confidence Bonuses for Scalable Exploration [57.91943847134011]
Intrinsic rewards play a central role in handling the exploration-exploitation trade-off.
We introduce emphanti-concentrated confidence bounds for efficiently approximating the elliptical bonus.
We develop a practical variant for deep reinforcement learning that is competitive with contemporary intrinsic rewards on Atari benchmarks.
arXiv Detail & Related papers (2021-10-21T15:25:15Z) - Policy Gradient Bayesian Robust Optimization for Imitation Learning [49.881386773269746]
We derive a novel policy gradient-style robust optimization approach, PG-BROIL, to balance expected performance and risk.
Results suggest PG-BROIL can produce a family of behaviors ranging from risk-neutral to risk-averse.
arXiv Detail & Related papers (2021-06-11T16:49:15Z) - Maximizing Information Gain in Partially Observable Environments via
Prediction Reward [64.24528565312463]
This paper tackles the challenge of using belief-based rewards for a deep RL agent.
We derive the exact error between negative entropy and the expected prediction reward.
This insight provides theoretical motivation for several fields using prediction rewards.
arXiv Detail & Related papers (2020-05-11T08:13:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.