Self-Supervised Video Representation Learning with Meta-Contrastive
Network
- URL: http://arxiv.org/abs/2108.08426v2
- Date: Mon, 23 Aug 2021 16:41:06 GMT
- Title: Self-Supervised Video Representation Learning with Meta-Contrastive
Network
- Authors: Yuanze Lin, Xun Guo, Yan Lu
- Abstract summary: We propose a Meta-Contrastive Network (MCN) to enhance the learning ability of self-supervised approaches.
For two downstream tasks, i.e., video action recognition and video retrieval, MCN outperforms state-of-the-art approaches.
- Score: 10.768575680990415
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised learning has been successfully applied to pre-train video
representations, which aims at efficient adaptation from pre-training domain to
downstream tasks. Existing approaches merely leverage contrastive loss to learn
instance-level discrimination. However, lack of category information will lead
to hard-positive problem that constrains the generalization ability of this
kind of methods. We find that the multi-task process of meta learning can
provide a solution to this problem. In this paper, we propose a
Meta-Contrastive Network (MCN), which combines the contrastive learning and
meta learning, to enhance the learning ability of existing self-supervised
approaches. Our method contains two training stages based on model-agnostic
meta learning (MAML), each of which consists of a contrastive branch and a meta
branch. Extensive evaluations demonstrate the effectiveness of our method. For
two downstream tasks, i.e., video action recognition and video retrieval, MCN
outperforms state-of-the-art approaches on UCF101 and HMDB51 datasets. To be
more specific, with R(2+1)D backbone, MCN achieves Top-1 accuracies of 84.8%
and 54.5% for video action recognition, as well as 52.5% and 23.7% for video
retrieval.
Related papers
- VideoAdviser: Video Knowledge Distillation for Multimodal Transfer
Learning [6.379202839994046]
Multimodal transfer learning aims to transform pretrained representations of diverse modalities into a common domain space for effective multimodal fusion.
We propose VideoAdviser, a video knowledge distillation method to transfer multimodal knowledge of video-enhanced prompts from a multimodal fundamental model to a specific modal fundamental model.
We evaluate our method in two challenging multimodal tasks: video-level sentiment analysis and audio-visual retrieval.
arXiv Detail & Related papers (2023-09-27T08:44:04Z) - Learning to Learn with Indispensable Connections [6.040904021861969]
We propose a novel meta-learning method called Meta-LTH that includes indispensible (necessary) connections.
Our method improves the classification accuracy by approximately 2% (20-way 1-shot task setting) for omniglot dataset.
arXiv Detail & Related papers (2023-04-06T04:53:13Z) - Multi-Level Contrastive Learning for Dense Prediction Task [59.591755258395594]
We present Multi-Level Contrastive Learning for Dense Prediction Task (MCL), an efficient self-supervised method for learning region-level feature representation for dense prediction tasks.
Our method is motivated by the three key factors in detection: localization, scale consistency and recognition.
Our method consistently outperforms the recent state-of-the-art methods on various datasets with significant margins.
arXiv Detail & Related papers (2023-04-04T17:59:04Z) - DOAD: Decoupled One Stage Action Detection Network [77.14883592642782]
Localizing people and recognizing their actions from videos is a challenging task towards high-level video understanding.
Existing methods are mostly two-stage based, with one stage for person bounding box generation and the other stage for action recognition.
We present a decoupled one-stage network dubbed DOAD, to improve the efficiency for-temporal action detection.
arXiv Detail & Related papers (2023-04-01T08:06:43Z) - Meta-Learning with Self-Improving Momentum Target [72.98879709228981]
We propose Self-improving Momentum Target (SiMT) to improve the performance of a meta-learner.
SiMT generates the target model by adapting from the temporal ensemble of the meta-learner.
We show that SiMT brings a significant performance gain when combined with a wide range of meta-learning methods.
arXiv Detail & Related papers (2022-10-11T06:45:15Z) - Motion Sensitive Contrastive Learning for Self-supervised Video
Representation [34.854431881562576]
Motion Sensitive Contrastive Learning (MSCL) injects the motion information captured by optical flows into RGB frames to strengthen feature learning.
Local Motion Contrastive Learning (LMCL) with frame-level contrastive objectives across the two modalities.
Flow Rotation Augmentation (FRA) to generate extra motion-shuffled negative samples and Motion Differential Sampling (MDS) to accurately screen training samples.
arXiv Detail & Related papers (2022-08-12T04:06:56Z) - ASCNet: Self-supervised Video Representation Learning with
Appearance-Speed Consistency [62.38914747727636]
We study self-supervised video representation learning, which is a challenging task due to 1) a lack of labels for explicit supervision and 2) unstructured and noisy visual information.
Existing methods mainly use contrastive loss with video clips as the instances and learn visual representation by discriminating instances from each other.
In this paper, we observe that the consistency between positive samples is the key to learn robust video representations.
arXiv Detail & Related papers (2021-06-04T08:44:50Z) - Lessons from Chasing Few-Shot Learning Benchmarks: Rethinking the
Evaluation of Meta-Learning Methods [9.821362920940631]
We introduce a simple baseline for meta-learning, FIX-ML.
We explore two possible goals of meta-learning: to develop methods that generalize (i) to the same task distribution that generates the training set (in-distribution), or (ii) to new, unseen task distributions (out-of-distribution)
Our results highlight that in order to reason about progress in this space, it is necessary to provide a clearer description of the goals of meta-learning, and to develop more appropriate evaluation strategies.
arXiv Detail & Related papers (2021-02-23T05:34:30Z) - Fast Few-Shot Classification by Few-Iteration Meta-Learning [173.32497326674775]
We introduce a fast optimization-based meta-learning method for few-shot classification.
Our strategy enables important aspects of the base learner objective to be learned during meta-training.
We perform a comprehensive experimental analysis, demonstrating the speed and effectiveness of our approach.
arXiv Detail & Related papers (2020-10-01T15:59:31Z) - Meta-Baseline: Exploring Simple Meta-Learning for Few-Shot Learning [79.25478727351604]
We explore a simple process: meta-learning over a whole-classification pre-trained model on its evaluation metric.
We observe this simple method achieves competitive performance to state-of-the-art methods on standard benchmarks.
arXiv Detail & Related papers (2020-03-09T20:06:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.