TCLR: Temporal Contrastive Learning for Video Representation
- URL: http://arxiv.org/abs/2101.07974v2
- Date: Thu, 4 Feb 2021 13:28:27 GMT
- Title: TCLR: Temporal Contrastive Learning for Video Representation
- Authors: Ishan Dave, Rohit Gupta, Mamshad Nayeem Rizve and Mubarak Shah
- Abstract summary: We develop a new temporal contrastive learning framework consisting of two novel losses to improve upon existing contrastive self-supervised video representation learning methods.
With the commonly used 3D-ResNet-18 architecture, we achieve 82.4% (+5.1% increase over the previous best) top-1 accuracy on UCF101 and 52.9% (+5.4% increase) on HMDB51 action classification.
- Score: 49.6637562402604
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Contrastive learning has nearly closed the gap between supervised and
self-supervised learning of image representations. Existing extensions of
contrastive learning to the domain of video data however do not explicitly
attempt to represent the internal distinctiveness across the temporal dimension
of video clips. We develop a new temporal contrastive learning framework
consisting of two novel losses to improve upon existing contrastive
self-supervised video representation learning methods. The first loss adds the
task of discriminating between non-overlapping clips from the same video,
whereas the second loss aims to discriminate between timesteps of the feature
map of an input clip in order to increase the temporal diversity of the
features. Temporal contrastive learning achieves significant improvement over
the state-of-the-art results in downstream video understanding tasks such as
action recognition, limited-label action classification, and nearest-neighbor
video retrieval on video datasets across multiple 3D CNN architectures. With
the commonly used 3D-ResNet-18 architecture, we achieve 82.4% (+5.1% increase
over the previous best) top-1 accuracy on UCF101 and 52.9% (+5.4% increase) on
HMDB51 action classification, and 56.2% (+11.7% increase) Top-1 Recall on
UCF101 nearest neighbor video retrieval.
Related papers
- Just a Glimpse: Rethinking Temporal Information for Video Continual
Learning [58.7097258722291]
We propose a novel replay mechanism for effective video continual learning based on individual/single frames.
Under extreme memory constraints, video diversity plays a more significant role than temporal information.
Our method achieves state-of-the-art performance, outperforming the previous state-of-the-art by up to 21.49%.
arXiv Detail & Related papers (2023-05-28T19:14:25Z) - Self-Supervised Video Representation Learning via Latent Time Navigation [12.721647696921865]
Self-supervised video representation learning aims at maximizing similarity between different temporal segments of one video.
We propose Latent Time Navigation (LTN) to capture fine-grained motions.
Our experimental analysis suggests that learning video representations by LTN consistently improves performance of action classification.
arXiv Detail & Related papers (2023-05-10T20:06:17Z) - ASCNet: Self-supervised Video Representation Learning with
Appearance-Speed Consistency [62.38914747727636]
We study self-supervised video representation learning, which is a challenging task due to 1) a lack of labels for explicit supervision and 2) unstructured and noisy visual information.
Existing methods mainly use contrastive loss with video clips as the instances and learn visual representation by discriminating instances from each other.
In this paper, we observe that the consistency between positive samples is the key to learn robust video representations.
arXiv Detail & Related papers (2021-06-04T08:44:50Z) - Semi-Supervised Action Recognition with Temporal Contrastive Learning [50.08957096801457]
We learn a two-pathway temporal contrastive model using unlabeled videos at two different speeds.
We considerably outperform video extensions of sophisticated state-of-the-art semi-supervised image recognition methods.
arXiv Detail & Related papers (2021-02-04T17:28:35Z) - RSPNet: Relative Speed Perception for Unsupervised Video Representation
Learning [100.76672109782815]
We study unsupervised video representation learning that seeks to learn both motion and appearance features from unlabeled video only.
It is difficult to construct a suitable self-supervised task to well model both motion and appearance features.
We propose a new way to perceive the playback speed and exploit the relative speed between two video clips as labels.
arXiv Detail & Related papers (2020-10-27T16:42:50Z) - Spatiotemporal Contrastive Video Representation Learning [87.56145031149869]
We present a self-supervised Contrastive Video Representation Learning (CVRL) method to learn visual representations from unlabeled videos.
Our representations are learned using a contrasttemporalive loss, where two augmented clips from the same short video are pulled together in the embedding space.
We study what makes for good data augmentations for video self-supervised learning and find that both spatial and temporal information are crucial.
arXiv Detail & Related papers (2020-08-09T19:58:45Z) - Self-supervised Video Representation Learning Using Inter-intra
Contrastive Framework [43.002621928500425]
We propose a self-supervised method to learn feature representations from videos.
Because video representation is important, we extend negative samples by introducing intra-negative samples.
We conduct experiments on video retrieval and video recognition tasks using the learned video representation.
arXiv Detail & Related papers (2020-08-06T09:08:14Z) - Self-supervised Temporal Discriminative Learning for Video
Representation Learning [39.43942923911425]
Temporal-discriminative features can hardly be extracted without using an annotated large-scale video action dataset for training.
This paper proposes a novel Video-based Temporal-Discriminative Learning framework in self-supervised manner.
arXiv Detail & Related papers (2020-08-05T13:36:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.