Fixed $\beta$-VAE Encoding for Curious Exploration in Complex 3D
Environments
- URL: http://arxiv.org/abs/2105.08568v1
- Date: Tue, 18 May 2021 14:52:36 GMT
- Title: Fixed $\beta$-VAE Encoding for Curious Exploration in Complex 3D
Environments
- Authors: Auguste Lehuger, Matthew Crosby
- Abstract summary: We show how a fixed $beta$-VAE encoding can be used effectively with curiosity.
We combine this with curriculum learning to solve the previously unsolved exploration intensive detour tasks.
We also corroborate the results on Atari Breakout, with our custom encoding outperforming random features and inverse-dynamics features.
- Score: 1.0152838128195467
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Curiosity is a general method for augmenting an environment reward with an
intrinsic reward, which encourages exploration and is especially useful in
sparse reward settings. As curiosity is calculated using next state prediction
error, the type of state encoding used has a large impact on performance.
Random features and inverse-dynamics features are generally preferred over VAEs
based on previous results from Atari and other mostly 2D environments. However,
unlike VAEs, they may not encode sufficient information for optimal behaviour,
which becomes increasingly important as environments become more complex. In
this paper, we use the sparse reward 3D physics environment Animal-AI, to
demonstrate how a fixed $\beta$-VAE encoding can be used effectively with
curiosity. We combine this with curriculum learning to solve the previously
unsolved exploration intensive detour tasks while achieving 22\% gain in sample
efficiency on the training curriculum against the next best encoding. We also
corroborate the results on Atari Breakout, with our custom encoding
outperforming random features and inverse-dynamics features.
Related papers
- SeA: Semantic Adversarial Augmentation for Last Layer Features from Unsupervised Representation Learning [13.779858242220724]
Deep features extracted from certain layers of a pre-trained deep model show superior performance over the conventional hand-crafted features.
We propose a novel semantic adversarial augmentation (SeA) in the feature space for optimization.
Our method is $2%$ better than the deep features without SeA on average.
arXiv Detail & Related papers (2024-08-23T19:55:13Z) - Value Explicit Pretraining for Learning Transferable Representations [11.069853883599102]
We propose a method that learns generalizable representations for transfer reinforcement learning.
We learn new tasks that share similar objectives as previously learned tasks, by learning an encoder for objective-conditioned representations.
Experiments using a realistic navigation simulator and Atari benchmark show that the pretrained encoder produced by our method outperforms current SoTA pretraining methods.
arXiv Detail & Related papers (2023-12-19T17:12:35Z) - $t^3$-Variational Autoencoder: Learning Heavy-tailed Data with Student's
t and Power Divergence [7.0479532872043755]
$t3$VAE is a modified VAE framework that incorporates Student's t-distributions for the prior, encoder, and decoder.
We show that $t3$VAE significantly outperforms other models on CelebA and imbalanced CIFAR-100 datasets.
arXiv Detail & Related papers (2023-12-02T13:14:28Z) - Improved Regret for Efficient Online Reinforcement Learning with Linear
Function Approximation [69.0695698566235]
We study reinforcement learning with linear function approximation and adversarially changing cost functions.
We present a computationally efficient policy optimization algorithm for the challenging general setting of unknown dynamics and bandit feedback.
arXiv Detail & Related papers (2023-01-30T17:26:39Z) - Rewarding Episodic Visitation Discrepancy for Exploration in
Reinforcement Learning [64.8463574294237]
We propose Rewarding Episodic Visitation Discrepancy (REVD) as an efficient and quantified exploration method.
REVD provides intrinsic rewards by evaluating the R'enyi divergence-based visitation discrepancy between episodes.
It is tested on PyBullet Robotics Environments and Atari games.
arXiv Detail & Related papers (2022-09-19T08:42:46Z) - Anti-Concentrated Confidence Bonuses for Scalable Exploration [57.91943847134011]
Intrinsic rewards play a central role in handling the exploration-exploitation trade-off.
We introduce emphanti-concentrated confidence bounds for efficiently approximating the elliptical bonus.
We develop a practical variant for deep reinforcement learning that is competitive with contemporary intrinsic rewards on Atari benchmarks.
arXiv Detail & Related papers (2021-10-21T15:25:15Z) - On Reward-Free RL with Kernel and Neural Function Approximations:
Single-Agent MDP and Markov Game [140.19656665344917]
We study the reward-free RL problem, where an agent aims to thoroughly explore the environment without any pre-specified reward function.
We tackle this problem under the context of function approximation, leveraging powerful function approximators.
We establish the first provably efficient reward-free RL algorithm with kernel and neural function approximators.
arXiv Detail & Related papers (2021-10-19T07:26:33Z) - Self-Supervised Exploration via Latent Bayesian Surprise [4.088019409160893]
In this work, we propose a curiosity-based bonus as intrinsic reward for Reinforcement Learning.
We extensively evaluate our model by measuring the agent's performance in terms of environment exploration.
Our model is cheap and empirically shows state-of-the-art performance on several problems.
arXiv Detail & Related papers (2021-04-15T14:40:16Z) - State Entropy Maximization with Random Encoders for Efficient
Exploration [162.39202927681484]
Recent exploration methods have proven to be a recipe for improving sample-efficiency in deep reinforcement learning (RL)
This paper presents Randoms for Efficient Exploration (RE3), an exploration method that utilizes state entropy as an intrinsic reward.
In particular, we find that the state entropy can be estimated in a stable and compute-efficient manner by utilizing a randomly encoder.
arXiv Detail & Related papers (2021-02-18T15:45:17Z) - Provably Efficient Reward-Agnostic Navigation with Linear Value
Iteration [143.43658264904863]
We show how iteration under a more standard notion of low inherent Bellman error, typically employed in least-square value-style algorithms, can provide strong PAC guarantees on learning a near optimal value function.
We present a computationally tractable algorithm for the reward-free setting and show how it can be used to learn a near optimal policy for any (linear) reward function.
arXiv Detail & Related papers (2020-08-18T04:34:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.