Self-supervised Sequential Information Bottleneck for Robust Exploration
in Deep Reinforcement Learning
- URL: http://arxiv.org/abs/2209.05333v1
- Date: Mon, 12 Sep 2022 15:41:10 GMT
- Title: Self-supervised Sequential Information Bottleneck for Robust Exploration
in Deep Reinforcement Learning
- Authors: Bang You, Jingming Xie, Youping Chen, Jan Peters, Oleg Arenz
- Abstract summary: In this work, we introduce the sequential information bottleneck objective for learning compressed and temporally coherent representations.
For efficient exploration in noisy environments, we further construct intrinsic rewards that capture task-relevant state novelty.
- Score: 28.75574762244266
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Effective exploration is critical for reinforcement learning agents in
environments with sparse rewards or high-dimensional state-action spaces.
Recent works based on state-visitation counts, curiosity and
entropy-maximization generate intrinsic reward signals to motivate the agent to
visit novel states for exploration. However, the agent can get distracted by
perturbations to sensor inputs that contain novel but task-irrelevant
information, e.g. due to sensor noise or changing background. In this work, we
introduce the sequential information bottleneck objective for learning
compressed and temporally coherent representations by modelling and compressing
sequential predictive information in time-series observations. For efficient
exploration in noisy environments, we further construct intrinsic rewards that
capture task-relevant state novelty based on the learned representations. We
derive a variational upper bound of our sequential information bottleneck
objective for practical optimization and provide an information-theoretic
interpretation of the derived upper bound. Our experiments on a set of
challenging image-based simulated control tasks show that our method achieves
better sample efficiency, and robustness to both white noise and natural video
backgrounds compared to state-of-art methods based on curiosity, entropy
maximization and information-gain.
Related papers
- Multimodal Information Bottleneck for Deep Reinforcement Learning with Multiple Sensors [10.454194186065195]
Reinforcement learning has achieved promising results on robotic control tasks but struggles to leverage information effectively.
Recent works construct auxiliary losses based on reconstruction or mutual information to extract joint representations from multiple sensory inputs.
We argue that compressing information in the learned joint representations about raw multimodal observations is helpful.
arXiv Detail & Related papers (2024-10-23T04:32:37Z) - Localized Gaussians as Self-Attention Weights for Point Clouds Correspondence [92.07601770031236]
We investigate semantically meaningful patterns in the attention heads of an encoder-only Transformer architecture.
We find that fixing the attention weights not only accelerates the training process but also enhances the stability of the optimization.
arXiv Detail & Related papers (2024-09-20T07:41:47Z) - Sequential Action-Induced Invariant Representation for Reinforcement
Learning [1.2046159151610263]
How to accurately learn task-relevant state representations from high-dimensional observations with visual distractions is a challenging problem in visual reinforcement learning.
We propose a Sequential Action-induced invariant Representation (SAR) method, in which the encoder is optimized by an auxiliary learner to only preserve the components that follow the control signals of sequential actions.
arXiv Detail & Related papers (2023-09-22T05:31:55Z) - Active Sensing with Predictive Coding and Uncertainty Minimization [0.0]
We present an end-to-end procedure for embodied exploration inspired by two biological computations.
We first demonstrate our approach in a maze navigation task and show that it can discover the underlying transition distributions and spatial features of the environment.
We show that our model builds unsupervised representations through exploration that allow it to efficiently categorize visual scenes.
arXiv Detail & Related papers (2023-07-02T21:14:49Z) - Palm up: Playing in the Latent Manifold for Unsupervised Pretraining [31.92145741769497]
We propose an algorithm that exhibits an exploratory behavior whilst it utilizes large diverse datasets.
Our key idea is to leverage deep generative models that are pretrained on static datasets and introduce a dynamic model in the latent space.
We then employ an unsupervised reinforcement learning algorithm to explore in this environment and perform unsupervised representation learning on the collected data.
arXiv Detail & Related papers (2022-10-19T22:26:12Z) - Entropy-driven Unsupervised Keypoint Representation Learning in Videos [7.940371647421243]
We present a novel approach for unsupervised learning of meaningful representations from videos.
We argue that textitlocal entropy of pixel neighborhoods and their temporal evolution create valuable intrinsic supervisory signals for learning prominent features.
Our empirical results show superior performance for our information-driven keypoints that resolve challenges like attendance to static and dynamic objects or objects abruptly entering and leaving the scene.
arXiv Detail & Related papers (2022-09-30T12:03:52Z) - Stochastic Coherence Over Attention Trajectory For Continuous Learning
In Video Streams [64.82800502603138]
This paper proposes a novel neural-network-based approach to progressively and autonomously develop pixel-wise representations in a video stream.
The proposed method is based on a human-like attention mechanism that allows the agent to learn by observing what is moving in the attended locations.
Our experiments leverage 3D virtual environments and they show that the proposed agents can learn to distinguish objects just by observing the video stream.
arXiv Detail & Related papers (2022-04-26T09:52:31Z) - Information-Theoretic Odometry Learning [83.36195426897768]
We propose a unified information theoretic framework for learning-motivated methods aimed at odometry estimation.
The proposed framework provides an elegant tool for performance evaluation and understanding in information-theoretic language.
arXiv Detail & Related papers (2022-03-11T02:37:35Z) - Information is Power: Intrinsic Control via Information Capture [110.3143711650806]
We argue that a compact and general learning objective is to minimize the entropy of the agent's state visitation estimated using a latent state-space model.
This objective induces an agent to both gather information about its environment, corresponding to reducing uncertainty, and to gain control over its environment, corresponding to reducing the unpredictability of future world states.
arXiv Detail & Related papers (2021-12-07T18:50:42Z) - Representation Learning for Sequence Data with Deep Autoencoding
Predictive Components [96.42805872177067]
We propose a self-supervised representation learning method for sequence data, based on the intuition that useful representations of sequence data should exhibit a simple structure in the latent space.
We encourage this latent structure by maximizing an estimate of predictive information of latent feature sequences, which is the mutual information between past and future windows at each time step.
We demonstrate that our method recovers the latent space of noisy dynamical systems, extracts predictive features for forecasting tasks, and improves automatic speech recognition when used to pretrain the encoder on large amounts of unlabeled data.
arXiv Detail & Related papers (2020-10-07T03:34:01Z) - Learning Invariant Representations for Reinforcement Learning without
Reconstruction [98.33235415273562]
We study how representation learning can accelerate reinforcement learning from rich observations, such as images, without relying either on domain knowledge or pixel-reconstruction.
Bisimulation metrics quantify behavioral similarity between states in continuous MDPs.
We demonstrate the effectiveness of our method at disregarding task-irrelevant information using modified visual MuJoCo tasks.
arXiv Detail & Related papers (2020-06-18T17:59:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.