Self-Supervised Exploration via Temporal Inconsistency in Reinforcement
Learning
- URL: http://arxiv.org/abs/2208.11361v2
- Date: Tue, 27 Jun 2023 01:23:31 GMT
- Title: Self-Supervised Exploration via Temporal Inconsistency in Reinforcement
Learning
- Authors: Zijian Gao, Kele Xu, Yuanzhao Zhai, Dawei Feng, Bo Ding, XinJun Mao,
Huaimin Wang
- Abstract summary: We present a novel intrinsic reward inspired by human learning, as humans evaluate curiosity by comparing current observations with historical knowledge.
Our method involves training a self-supervised prediction model, saving snapshots of the model parameters, and using nuclear norm to evaluate the temporal inconsistency between the predictions of different snapshots as intrinsic rewards.
- Score: 17.360622968442982
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Under sparse extrinsic reward settings, reinforcement learning has remained
challenging, despite surging interests in this field. Previous attempts suggest
that intrinsic reward can alleviate the issue caused by sparsity. In this
article, we present a novel intrinsic reward that is inspired by human
learning, as humans evaluate curiosity by comparing current observations with
historical knowledge. Our method involves training a self-supervised prediction
model, saving snapshots of the model parameters, and using nuclear norm to
evaluate the temporal inconsistency between the predictions of different
snapshots as intrinsic rewards. We also propose a variational weighting
mechanism to assign weight to different snapshots in an adaptive manner. Our
experimental results on various benchmark environments demonstrate the efficacy
of our method, which outperforms other intrinsic reward-based methods without
additional training costs and with higher noise tolerance. This work has been
submitted to the IEEE for possible publication. Copyright may be transferred
without notice, after which this version may no longer be accessible.
Related papers
- Ctrl-U: Robust Conditional Image Generation via Uncertainty-aware Reward Modeling [18.93897922183304]
We focus on the task of conditional image generation, where an image is synthesized according to user instructions.
We propose an uncertainty-aware reward modeling, called Ctrl-U, designed to reduce the adverse effects of imprecise feedback from the reward model.
arXiv Detail & Related papers (2024-10-15T03:43:51Z) - Self-supervised network distillation: an effective approach to exploration in sparse reward environments [0.0]
Reinforcement learning can train an agent to behave in an environment according to a predesigned reward function.
The solution to such a problem may be to equip the agent with an intrinsic motivation that will provide informed exploration.
We present Self-supervised Network Distillation (SND), a class of intrinsic motivation algorithms based on the distillation error as a novelty indicator.
arXiv Detail & Related papers (2023-02-22T18:58:09Z) - The Enemy of My Enemy is My Friend: Exploring Inverse Adversaries for
Improving Adversarial Training [72.39526433794707]
Adversarial training and its variants have been shown to be the most effective approaches to defend against adversarial examples.
We propose a novel adversarial training scheme that encourages the model to produce similar outputs for an adversarial example and its inverse adversarial'' counterpart.
Our training method achieves state-of-the-art robustness as well as natural accuracy.
arXiv Detail & Related papers (2022-11-01T15:24:26Z) - Robust Transferable Feature Extractors: Learning to Defend Pre-Trained
Networks Against White Box Adversaries [69.53730499849023]
We show that adversarial examples can be successfully transferred to another independently trained model to induce prediction errors.
We propose a deep learning-based pre-processing mechanism, which we refer to as a robust transferable feature extractor (RTFE)
arXiv Detail & Related papers (2022-09-14T21:09:34Z) - Nuclear Norm Maximization Based Curiosity-Driven Learning [22.346209746751818]
We propose a novel curiosity leveraging the nuclear norm (NNM)
On 26 Atari games, NNM achieves a human-normalized score of 1.09, which doubles that of competitive intrinsic rewards-based approaches.
arXiv Detail & Related papers (2022-05-21T01:52:47Z) - Imitating, Fast and Slow: Robust learning from demonstrations via
decision-time planning [96.72185761508668]
Planning at Test-time (IMPLANT) is a new meta-algorithm for imitation learning.
We demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments.
arXiv Detail & Related papers (2022-04-07T17:16:52Z) - SURF: Semi-supervised Reward Learning with Data Augmentation for
Feedback-efficient Preference-based Reinforcement Learning [168.89470249446023]
We present SURF, a semi-supervised reward learning framework that utilizes a large amount of unlabeled samples with data augmentation.
In order to leverage unlabeled samples for reward learning, we infer pseudo-labels of the unlabeled samples based on the confidence of the preference predictor.
Our experiments demonstrate that our approach significantly improves the feedback-efficiency of the preference-based method on a variety of locomotion and robotic manipulation tasks.
arXiv Detail & Related papers (2022-03-18T16:50:38Z) - Regularizing Variational Autoencoder with Diversity and Uncertainty
Awareness [61.827054365139645]
Variational Autoencoder (VAE) approximates the posterior of latent variables based on amortized variational inference.
We propose an alternative model, DU-VAE, for learning a more Diverse and less Uncertain latent space.
arXiv Detail & Related papers (2021-10-24T07:58:13Z) - Policy Smoothing for Provably Robust Reinforcement Learning [109.90239627115336]
We study the provable robustness of reinforcement learning against norm-bounded adversarial perturbations of the inputs.
We generate certificates that guarantee that the total reward obtained by the smoothed policy will not fall below a certain threshold under a norm-bounded adversarial of perturbation the input.
arXiv Detail & Related papers (2021-06-21T21:42:08Z) - Semi-supervised Sequential Generative Models [16.23492955875404]
We introduce a novel objective for training deep generative time-series models with discrete latent variables for which supervision is only sparsely available.
We first overcome this problem by extending the standard semi-supervised generative modeling objective with reweighted wake-sleep.
Finally, we introduce a unified objective inspired by teacher-forcing and show that this approach is robust to variable length supervision.
arXiv Detail & Related papers (2020-06-30T23:53:12Z) - Effects of sparse rewards of different magnitudes in the speed of
learning of model-based actor critic methods [0.4640835690336653]
We show that we can influence an agent to learn faster by applying an external environmental pressure during training.
Results have been shown to be valid for Deep Deterministic Policy Gradients using Hindsight Experience Replay in a well known Mujoco environment.
arXiv Detail & Related papers (2020-01-18T20:52:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.