SCVRL: Shuffled Contrastive Video Representation Learning
- URL: http://arxiv.org/abs/2205.11710v1
- Date: Tue, 24 May 2022 01:24:47 GMT
- Title: SCVRL: Shuffled Contrastive Video Representation Learning
- Authors: Michael Dorkenwald, Fanyi Xiao, Biagio Brattoli, Joseph Tighe, Davide
Modolo
- Abstract summary: SCVRL is a contrastive-based framework for self-supervised learning for videos.
We reformulate the popular shuffling pretext task within a modern contrastive learning paradigm.
Our network has a natural capacity to learn motion in self-supervised settings and achieves strong performance, outperforming CVRL on four benchmarks.
- Score: 28.06521069427918
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose SCVRL, a novel contrastive-based framework for self-supervised
learning for videos. Differently from previous contrast learning based methods
that mostly focus on learning visual semantics (e.g., CVRL), SCVRL is capable
of learning both semantic and motion patterns. For that, we reformulate the
popular shuffling pretext task within a modern contrastive learning paradigm.
We show that our transformer-based network has a natural capacity to learn
motion in self-supervised settings and achieves strong performance,
outperforming CVRL on four benchmarks.
Related papers
- Pretrained Visual Representations in Reinforcement Learning [0.0]
This paper compares the performance of visual reinforcement learning algorithms that train a convolutional neural network (CNN) from scratch with those that utilize pre-trained visual representations (PVRs)
We evaluate the Dormant Ratio Minimization (DRM) algorithm, a state-of-the-art visual RL method, against three PVRs: ResNet18, DINOv2, and Visual Cortex (VC)
arXiv Detail & Related papers (2024-07-24T12:53:26Z) - Vision-Language Models Provide Promptable Representations for Reinforcement Learning [67.40524195671479]
We propose a novel approach that uses the vast amounts of general and indexable world knowledge encoded in vision-language models (VLMs) pre-trained on Internet-scale data for embodied reinforcement learning (RL)
We show that our approach can use chain-of-thought prompting to produce representations of common-sense semantic reasoning, improving policy performance in novel scenes by 1.5 times.
arXiv Detail & Related papers (2024-02-05T00:48:56Z) - RePo: Resilient Model-Based Reinforcement Learning by Regularizing
Posterior Predictability [25.943330238941602]
We propose a visual model-based RL method that learns a latent representation resilient to spurious variations.
Our training objective encourages the representation to be maximally predictive of dynamics and reward.
Our effort is a step towards making model-based RL a practical and useful tool for dynamic, diverse domains.
arXiv Detail & Related papers (2023-08-31T18:43:04Z) - Uncovering the Hidden Dynamics of Video Self-supervised Learning under
Distribution Shifts [39.080610060557476]
We study the behavior of six popular self-supervised methods (v-SimCLR, v-MoCo, v-BYOL, v-SimSiam, v-DINO, v-MAE) in response to various forms of natural distribution shift.
Our study uncovers a series of intriguing findings and interesting behaviors of VSSL methods.
arXiv Detail & Related papers (2023-06-03T06:10:20Z) - Video Action Recognition Collaborative Learning with Dynamics via
PSO-ConvNet Transformer [1.876462046907555]
We propose a novel PSO-ConvNet model for learning actions in videos.
Our experimental results on the UCF-101 dataset demonstrate substantial improvements of up to 9% in accuracy.
Overall, our dynamic PSO-ConvNet model provides a promising direction for improving Human Action Recognition.
arXiv Detail & Related papers (2023-02-17T23:39:34Z) - Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning [92.18524491615548]
Contrastive self-supervised learning has been successfully integrated into the practice of (deep) reinforcement learning (RL)
We study how RL can be empowered by contrastive learning in a class of Markov decision processes (MDPs) and Markov games (MGs) with low-rank transitions.
Under the online setting, we propose novel upper confidence bound (UCB)-type algorithms that incorporate such a contrastive loss with online RL algorithms for MDPs or MGs.
arXiv Detail & Related papers (2022-07-29T17:29:08Z) - Revisiting Classifier: Transferring Vision-Language Models for Video
Recognition [102.93524173258487]
Transferring knowledge from task-agnostic pre-trained deep models for downstream tasks is an important topic in computer vision research.
In this study, we focus on transferring knowledge for video classification tasks.
We utilize the well-pretrained language model to generate good semantic target for efficient transferring learning.
arXiv Detail & Related papers (2022-07-04T10:00:47Z) - Vision Transformer for Contrastive Clustering [48.476602271481674]
Vision Transformer (ViT) has shown its advantages over the convolutional neural network (CNN)
This paper presents an end-to-end deep image clustering approach termed Vision Transformer for Contrastive Clustering (VTCC)
arXiv Detail & Related papers (2022-06-26T17:00:35Z) - RvS: What is Essential for Offline RL via Supervised Learning? [77.91045677562802]
Recent work has shown that supervised learning alone, without temporal difference (TD) learning, can be remarkably effective for offline RL.
In every environment suite we consider simply maximizing likelihood with two-layer feedforward is competitive.
They also probe the limits of existing RvS methods, which are comparatively weak on random data.
arXiv Detail & Related papers (2021-12-20T18:55:16Z) - CoCon: Cooperative-Contrastive Learning [52.342936645996765]
Self-supervised visual representation learning is key for efficient video analysis.
Recent success in learning image representations suggests contrastive learning is a promising framework to tackle this challenge.
We introduce a cooperative variant of contrastive learning to utilize complementary information across views.
arXiv Detail & Related papers (2021-04-30T05:46:02Z) - Contrastive Variational Reinforcement Learning for Complex Observations [39.98639686743489]
This paper presents Contrastive Variational Reinforcement Learning (CVRL), a model-based method that tackles complex visual observations in DRL.
CVRL learns a contrastive variational model by maximizing the mutual information between latent states and observations discriminatively.
It achieves comparable performance with state-of-the-art model-based DRL methods on standard Mujoco tasks.
arXiv Detail & Related papers (2020-08-06T02:25:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.