ViewCLR: Learning Self-supervised Video Representation for Unseen
Viewpoints
- URL: http://arxiv.org/abs/2112.03905v1
- Date: Tue, 7 Dec 2021 18:58:29 GMT
- Title: ViewCLR: Learning Self-supervised Video Representation for Unseen
Viewpoints
- Authors: Srijan Das and Michael S. Ryoo
- Abstract summary: We propose ViewCLR, that learns self-supervised video representation invariant to camera viewpoint changes.
We introduce a view-generator that can be considered as a learnable augmentation for any self-supervised pre-text tasks.
- Score: 47.54827916387143
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning self-supervised video representation predominantly focuses on
discriminating instances generated from simple data augmentation schemes.
However, the learned representation often fails to generalize over unseen
camera viewpoints. To this end, we propose ViewCLR, that learns self-supervised
video representation invariant to camera viewpoint changes. We introduce a
view-generator that can be considered as a learnable augmentation for any
self-supervised pre-text tasks, to generate latent viewpoint representation of
a video. ViewCLR maximizes the similarities between the latent viewpoint
representation with its representation from the original viewpoint, enabling
the learned video encoder to generalize over unseen camera viewpoints.
Experiments on cross-view benchmark datasets including NTU RGB+D dataset show
that ViewCLR stands as a state-of-the-art viewpoint invariant self-supervised
method.
Related papers
- v-CLR: View-Consistent Learning for Open-World Instance Segmentation [24.32192108470939]
Vanilla visual networks are biased toward learning appearance information, eg texture, to recognize objects.
This implicit bias causes the model to fail in detecting novel objects with textures unseen in the open-world setting.
We propose view-Consistent LeaRning (v-CLR), which aims to enforce the model to learn appearance-invariant representations for robust instance segmentation.
arXiv Detail & Related papers (2025-04-02T05:52:30Z) - Switch-a-View: View Selection Learned from Unlabeled In-the-wild Videos [71.01549400773197]
We introduce SWITCH-A-VIEW, a model that learns to automatically select the viewpoint to display at each timepoint when creating a how-to video.
We pose a pretext task that pseudo-labels segments in the training videos for their primary viewpoint.
We then discover the patterns between the visual and spoken content in a how-to video on the one hand and its view-switch moments on the other hand.
arXiv Detail & Related papers (2024-12-24T12:16:43Z) - Which Viewpoint Shows it Best? Language for Weakly Supervising View Selection in Multi-view Videos [66.1935609072708]
Key hypothesis is that the more accurately an individual view can predict a view-agnostic text summary, the more informative it is.
We propose a framework that uses the relative accuracy of view-dependent caption predictions as a proxy for best view pseudo-labels.
During inference, our model takes as input only a multi-view video -- no language or camera poses -- and returns the best viewpoint to watch at each timestep.
arXiv Detail & Related papers (2024-11-13T16:31:08Z) - POV: Prompt-Oriented View-Agnostic Learning for Egocentric Hand-Object
Interaction in the Multi-View World [59.545114016224254]
Humans are good at translating third-person observations of hand-object interactions into an egocentric view.
We propose a Prompt-Oriented View-agnostic learning framework, which enables this view adaptation with few egocentric videos.
arXiv Detail & Related papers (2024-03-09T09:54:44Z) - MV2MAE: Multi-View Video Masked Autoencoders [33.61642891911761]
We present a method for self-supervised learning from synchronized multi-view videos.
We use a cross-view reconstruction task to inject geometry information in the model.
Our approach is based on the masked autoencoder (MAE) framework.
arXiv Detail & Related papers (2024-01-29T05:58:23Z) - Shepherding Slots to Objects: Towards Stable and Robust Object-Centric
Learning [28.368429312400885]
Single-view images carry less information about how to disentangle a given scene than videos or multi-view images do.
We introduce a novel OCL framework for single-view images, SLot Attention via SHepherding (SLASH), which consists of two simple-yet-effective modules on top of Slot Attention.
Our proposed method enables consistent learning of object-centric representation and achieves strong performance across four datasets.
arXiv Detail & Related papers (2023-03-31T07:07:29Z) - Self-supervised Video-centralised Transformer for Video Face Clustering [58.12996668434134]
This paper presents a novel method for face clustering in videos using a video-centralised transformer.
We release the first large-scale egocentric video face clustering dataset named EasyCom-Clustering.
arXiv Detail & Related papers (2022-03-24T16:38:54Z) - Multiview Pseudo-Labeling for Semi-supervised Learning from Video [102.36355560553402]
We present a novel framework that uses complementary views in the form of appearance and motion information for semi-supervised learning in video.
Our method capitalizes on multiple views, but it nonetheless trains a model that is shared across appearance and motion input.
On multiple video recognition datasets, our method substantially outperforms its supervised counterpart, and compares favorably to previous work on standard benchmarks in self-supervised video representation learning.
arXiv Detail & Related papers (2021-04-01T17:59:48Z) - Broaden Your Views for Self-Supervised Video Learning [97.52216510672251]
We introduce BraVe, a self-supervised learning framework for video.
In BraVe, one of the views has access to a narrow temporal window of the video while the other view has a broad access to the video content.
We demonstrate that BraVe achieves state-of-the-art results in self-supervised representation learning on standard video and audio classification benchmarks.
arXiv Detail & Related papers (2021-03-30T17:58:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.