Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised
Video Representation Learning
- URL: http://arxiv.org/abs/2112.03803v2
- Date: Wed, 8 Dec 2021 06:26:39 GMT
- Title: Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised
Video Representation Learning
- Authors: Manlin Zhang, Jinpeng Wang, Andy J. Ma
- Abstract summary: We propose a novel method to suppress static visual cues (SSVC) based on probabilistic analysis for self-supervised video representation learning.
By modelling static factors in a video as a random variable, the conditional distribution of each latent variable becomes shifted and scaled normal.
Finally, positive pairs are constructed by motion-preserved videos for contrastive learning to alleviate the problem of representation bias to static cues.
- Score: 7.27708818665289
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the great progress in video understanding made by deep convolutional
neural networks, feature representation learned by existing methods may be
biased to static visual cues. To address this issue, we propose a novel method
to suppress static visual cues (SSVC) based on probabilistic analysis for
self-supervised video representation learning. In our method, video frames are
first encoded to obtain latent variables under standard normal distribution via
normalizing flows. By modelling static factors in a video as a random variable,
the conditional distribution of each latent variable becomes shifted and scaled
normal. Then, the less-varying latent variables along time are selected as
static cues and suppressed to generate motion-preserved videos. Finally,
positive pairs are constructed by motion-preserved videos for contrastive
learning to alleviate the problem of representation bias to static cues. The
less-biased video representation can be better generalized to various
downstream tasks. Extensive experiments on publicly available benchmarks
demonstrate that the proposed method outperforms the state of the art when only
single RGB modality is used for pre-training.
Related papers
- Don't Judge by the Look: Towards Motion Coherent Video Representation [56.09346222721583]
Motion Coherent Augmentation (MCA) is a data augmentation method for video understanding.
MCA introduces appearance variation in videos and implicitly encourages the model to prioritize motion patterns, rather than static appearances.
arXiv Detail & Related papers (2024-03-14T15:53:04Z) - Transform-Equivariant Consistency Learning for Temporal Sentence
Grounding [66.10949751429781]
We introduce a novel Equivariant Consistency Regulation Learning framework to learn more discriminative representations for each video.
Our motivation comes from that the temporal boundary of the query-guided activity should be consistently predicted.
In particular, we devise a self-supervised consistency loss module to enhance the completeness and smoothness of the augmented video.
arXiv Detail & Related papers (2023-05-06T19:29:28Z) - Mitigating and Evaluating Static Bias of Action Representations in the
Background and the Foreground [59.916365866505636]
Shortcut static features can interfere with the learning of motion features, resulting in poor out-of-distribution generalization.
In this paper, we empirically verify the existence of foreground static bias by creating test videos with conflicting signals from the static and moving portions of the video.
StillMix identifies bias-inducing video frames using a 2D reference network and mixes them with videos for training, serving as effective bias suppression.
arXiv Detail & Related papers (2022-11-23T11:40:02Z) - Quantifying and Learning Static vs. Dynamic Information in Deep Spatiotemporal Networks [29.47784194895489]
Action recognition, automatic video object segmentation (AVOS) and video instance segmentation (VIS) are studied.
Most examined models are biased toward static information.
Some datasets that are assumed to be biased toward dynamics are actually biased toward static information.
arXiv Detail & Related papers (2022-11-03T13:17:53Z) - Mitigating Representation Bias in Action Recognition: Algorithms and
Benchmarks [76.35271072704384]
Deep learning models perform poorly when applied to videos with rare scenes or objects.
We tackle this problem from two different angles: algorithm and dataset.
We show that the debiased representation can generalize better when transferred to other datasets and tasks.
arXiv Detail & Related papers (2022-09-20T00:30:35Z) - Modelling Latent Dynamics of StyleGAN using Neural ODEs [52.03496093312985]
We learn the trajectory of independently inverted latent codes from GANs.
The learned continuous trajectory allows us to perform infinite frame and consistent video manipulation.
Our method achieves state-of-the-art performance but with much less computation.
arXiv Detail & Related papers (2022-08-23T21:20:38Z) - Multi-Contextual Predictions with Vision Transformer for Video Anomaly
Detection [22.098399083491937]
understanding of thetemporal context of a video plays a vital role in anomaly detection.
We design a transformer model with three different contextual prediction streams: masked, whole and partial.
By learning to predict the missing frames of consecutive normal frames, our model can effectively learn various normality patterns in the video.
arXiv Detail & Related papers (2022-06-17T05:54:31Z) - Probabilistic Representations for Video Contrastive Learning [64.47354178088784]
This paper presents a self-supervised representation learning method that bridges contrastive learning with probabilistic representation.
By sampling embeddings from the whole video distribution, we can circumvent the careful sampling strategy or transformations to generate augmented views of the clips.
arXiv Detail & Related papers (2022-04-08T09:09:30Z) - Stochastic Image-to-Video Synthesis using cINNs [22.5739334314885]
A conditional invertible neural network (cINN) can explain videos by independently modelling static and other video characteristics.
Experiments on four diverse video datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2021-05-10T17:59:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.