Cycle-Contrast for Self-Supervised Video Representation Learning
- URL: http://arxiv.org/abs/2010.14810v1
- Date: Wed, 28 Oct 2020 08:27:58 GMT
- Title: Cycle-Contrast for Self-Supervised Video Representation Learning
- Authors: Quan Kong, Wenpeng Wei, Ziwei Deng, Tomoaki Yoshinaga, Tomokazu
Murakami
- Abstract summary: We present Cycle-Contrastive Learning (CCL), a novel self-supervised method for learning video representation.
In our method, the frame and video representations are learned from a single network based on an R3D architecture.
We demonstrate that the video representation learned by CCL can be transferred well to downstream tasks of video understanding.
- Score: 10.395615031496064
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present Cycle-Contrastive Learning (CCL), a novel self-supervised method
for learning video representation. Following a nature that there is a belong
and inclusion relation of video and its frames, CCL is designed to find
correspondences across frames and videos considering the contrastive
representation in their domains respectively. It is different from recent
approaches that merely learn correspondences across frames or clips. In our
method, the frame and video representations are learned from a single network
based on an R3D architecture, with a shared non-linear transformation for
embedding both frame and video features before the cycle-contrastive loss. We
demonstrate that the video representation learned by CCL can be transferred
well to downstream tasks of video understanding, outperforming previous methods
in nearest neighbour retrieval and action recognition tasks on UCF101, HMDB51
and MMAct.
Related papers
- Collaboratively Self-supervised Video Representation Learning for Action
Recognition [58.195372471117615]
We design a Collaboratively Self-supervised Video Representation learning framework specific to action recognition.
Our method achieves state-of-the-art performance on the UCF101 and HMDB51 datasets.
arXiv Detail & Related papers (2024-01-15T10:42:04Z) - Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval [24.691270610091554]
In this paper, we aim to learn semantically-enhanced representations purely from the video, so that the video representations can be computed offline and reused for different texts.
We obtain state-of-the-art performances on three benchmark datasets, i.e., MSR-VTT, MSVD, and LSMDC.
arXiv Detail & Related papers (2023-08-15T08:54:25Z) - Frozen CLIP Models are Efficient Video Learners [86.73871814176795]
Video recognition has been dominated by the end-to-end learning paradigm.
Recent advances in Contrastive Vision-Language Pre-training pave the way for a new route for visual recognition tasks.
We present Efficient Video Learning -- an efficient framework for directly training high-quality video recognition models.
arXiv Detail & Related papers (2022-08-06T17:38:25Z) - Cross-Architecture Self-supervised Video Representation Learning [42.267775859095664]
We present a new cross-architecture contrastive learning framework for self-supervised video representation learning.
We introduce a temporal self-supervised learning module able to predict an Edit distance explicitly between two video sequences.
We evaluate our method on the tasks of video retrieval and action recognition on UCF101 and HMDB51 datasets.
arXiv Detail & Related papers (2022-05-26T12:41:19Z) - Self-Supervised Video Representation Learning with Motion-Contrastive
Perception [13.860736711747284]
Motion-Contrastive Perception Network (MCPNet)
MCPNet consists of two branches, namely, Motion Information Perception (MIP) and Contrastive Instance Perception (CIP)
Our method outperforms current state-of-the-art visual-only self-supervised approaches.
arXiv Detail & Related papers (2022-04-10T05:34:46Z) - Learning from Untrimmed Videos: Self-Supervised Video Representation
Learning with Hierarchical Consistency [60.756222188023635]
We propose to learn representations by leveraging more abundant information in unsupervised videos.
HiCo can generate stronger representations on untrimmed videos, it also improves the representation quality when applied to trimmed videos.
arXiv Detail & Related papers (2022-04-06T18:04:54Z) - Self-Supervised Video Representation Learning by Video Incoherence
Detection [28.540645395066434]
This paper introduces a novel self-supervised method that leverages incoherence detection for video representation learning.
It roots from the observation that visual systems of human beings can easily identify video incoherence based on their comprehensive understanding of videos.
arXiv Detail & Related papers (2021-09-26T04:58:13Z) - Video Corpus Moment Retrieval with Contrastive Learning [56.249924768243375]
Video corpus moment retrieval (VCMR) is to retrieve a temporal moment that semantically corresponds to a given text query.
We propose a Retrieval and Localization Network with Contrastive Learning (ReLoCLNet) for VCMR.
Experimental results show that ReLoCLNet encodes text and video separately for efficiency, its retrieval accuracy is comparable with baselines adopting cross-modal interaction learning.
arXiv Detail & Related papers (2021-05-13T12:54:39Z) - Contrastive Transformation for Self-supervised Correspondence Learning [120.62547360463923]
We study the self-supervised learning of visual correspondence using unlabeled videos in the wild.
Our method simultaneously considers intra- and inter-video representation associations for reliable correspondence estimation.
Our framework outperforms the recent self-supervised correspondence methods on a range of visual tasks.
arXiv Detail & Related papers (2020-12-09T14:05:06Z) - Unsupervised Learning of Video Representations via Dense Trajectory
Clustering [86.45054867170795]
This paper addresses the task of unsupervised learning of representations for action recognition in videos.
We first propose to adapt two top performing objectives in this class - instance recognition and local aggregation.
We observe promising performance, but qualitative analysis shows that the learned representations fail to capture motion patterns.
arXiv Detail & Related papers (2020-06-28T22:23:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.