Integrating Contrastive Learning with Dynamic Models for Reinforcement
Learning from Images
- URL: http://arxiv.org/abs/2203.01810v1
- Date: Wed, 2 Mar 2022 14:39:17 GMT
- Title: Integrating Contrastive Learning with Dynamic Models for Reinforcement
Learning from Images
- Authors: Bang You, Oleg Arenz, Youping Chen, Jan Peters
- Abstract summary: We argue that explicitly improving Markovianity of the learned embedding is desirable.
We propose a self-supervised representation learning method which integrates contrastive learning with dynamic models.
- Score: 31.413588478694496
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent methods for reinforcement learning from images use auxiliary tasks to
learn image features that are used by the agent's policy or Q-function. In
particular, methods based on contrastive learning that induce linearity of the
latent dynamics or invariance to data augmentation have been shown to greatly
improve the sample efficiency of the reinforcement learning algorithm and the
generalizability of the learned embedding. We further argue, that explicitly
improving Markovianity of the learned embedding is desirable and propose a
self-supervised representation learning method which integrates contrastive
learning with dynamic models to synergistically combine these three objectives:
(1) We maximize the InfoNCE bound on the mutual information between the state-
and action-embedding and the embedding of the next state to induce a linearly
predictive embedding without explicitly learning a linear transition model, (2)
we further improve Markovianity of the learned embedding by explicitly learning
a non-linear transition model using regression, and (3) we maximize the mutual
information between the two nonlinear predictions of the next embeddings based
on the current action and two independent augmentations of the current state,
which naturally induces transformation invariance not only for the state
embedding, but also for the nonlinear transition model. Experimental evaluation
on the Deepmind control suite shows that our proposed method achieves higher
sample efficiency and better generalization than state-of-art methods based on
contrastive learning or reconstruction.
Related papers
- Robustness Reprogramming for Representation Learning [18.466637575445024]
Given a well-trained deep learning model, can it be reprogrammed to enhance its robustness against adversarial or noisy input perturbations without altering its parameters?
We propose a novel non-linear robust pattern matching technique as a robust alternative.
arXiv Detail & Related papers (2024-10-06T18:19:02Z) - Novel Saliency Analysis for the Forward Forward Algorithm [0.0]
We introduce the Forward Forward algorithm into neural network training.
This method involves executing two forward passes the first with actual data to promote positive reinforcement, and the second with synthetically generated negative data to enable discriminative learning.
To overcome the limitations inherent in traditional saliency techniques, we developed a bespoke saliency algorithm specifically tailored for the Forward Forward framework.
arXiv Detail & Related papers (2024-09-18T17:21:59Z) - MOOSS: Mask-Enhanced Temporal Contrastive Learning for Smooth State Evolution in Visual Reinforcement Learning [8.61492882526007]
In visual Reinforcement Learning (RL), learning from pixel-based observations poses significant challenges on sample efficiency.
We introduce MOOSS, a novel framework that leverages a temporal contrastive objective with the help of graph-based spatial-temporal masking.
Our evaluation on multiple continuous and discrete control benchmarks shows that MOOSS outperforms previous state-of-the-art visual RL methods in terms of sample efficiency.
arXiv Detail & Related papers (2024-09-02T18:57:53Z) - Latent Variable Representation for Reinforcement Learning [131.03944557979725]
It remains unclear theoretically and empirically how latent variable models may facilitate learning, planning, and exploration to improve the sample efficiency of model-based reinforcement learning.
We provide a representation view of the latent variable models for state-action value functions, which allows both tractable variational learning algorithm and effective implementation of the optimism/pessimism principle.
In particular, we propose a computationally efficient planning algorithm with UCB exploration by incorporating kernel embeddings of latent variable models.
arXiv Detail & Related papers (2022-12-17T00:26:31Z) - Imposing Consistency for Optical Flow Estimation [73.53204596544472]
Imposing consistency through proxy tasks has been shown to enhance data-driven learning.
This paper introduces novel and effective consistency strategies for optical flow estimation.
arXiv Detail & Related papers (2022-04-14T22:58:30Z) - Near-optimal Offline Reinforcement Learning with Linear Representation:
Leveraging Variance Information with Pessimism [65.46524775457928]
offline reinforcement learning seeks to utilize offline/historical data to optimize sequential decision-making strategies.
We study the statistical limits of offline reinforcement learning with linear model representations.
arXiv Detail & Related papers (2022-03-11T09:00:12Z) - Refining Self-Supervised Learning in Imaging: Beyond Linear Metric [25.96406219707398]
We introduce in this paper a new statistical perspective, exploiting the Jaccard similarity metric, as a measure-based metric.
Specifically, our proposed metric may be interpreted as a dependence measure between two adapted projections learned from the so-called latent representations.
To the best of our knowledge, this effectively non-linearly fused information embedded in the Jaccard similarity, is novel to self-supervision learning with promising results.
arXiv Detail & Related papers (2022-02-25T19:25:05Z) - Consistency and Monotonicity Regularization for Neural Knowledge Tracing [50.92661409499299]
Knowledge Tracing (KT) tracking a human's knowledge acquisition is a central component in online learning and AI in Education.
We propose three types of novel data augmentation, coined replacement, insertion, and deletion, along with corresponding regularization losses.
Extensive experiments on various KT benchmarks show that our regularization scheme consistently improves the model performances.
arXiv Detail & Related papers (2021-05-03T02:36:29Z) - Domain Knowledge Integration By Gradient Matching For Sample-Efficient
Reinforcement Learning [0.0]
We propose a gradient matching algorithm to improve sample efficiency by utilizing target slope information from the dynamics to aid the model-free learner.
We demonstrate this by presenting a technique for matching the gradient information from the model-based learner with the model-free component in an abstract low-dimensional space.
arXiv Detail & Related papers (2020-05-28T05:02:47Z) - Model-Augmented Actor-Critic: Backpropagating through Paths [81.86992776864729]
Current model-based reinforcement learning approaches use the model simply as a learned black-box simulator.
We show how to make more effective use of the model by exploiting its differentiability.
arXiv Detail & Related papers (2020-05-16T19:18:10Z) - Guided Variational Autoencoder for Disentanglement Learning [79.02010588207416]
We propose an algorithm, guided variational autoencoder (Guided-VAE), that is able to learn a controllable generative model by performing latent representation disentanglement learning.
We design an unsupervised strategy and a supervised strategy in Guided-VAE and observe enhanced modeling and controlling capability over the vanilla VAE.
arXiv Detail & Related papers (2020-04-02T20:49:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.