Pixel-level Correspondence for Self-Supervised Learning from Video
- URL: http://arxiv.org/abs/2207.03866v1
- Date: Fri, 8 Jul 2022 12:50:13 GMT
- Title: Pixel-level Correspondence for Self-Supervised Learning from Video
- Authors: Yash Sharma, Yi Zhu, Chris Russell, Thomas Brox
- Abstract summary: Pixel-level Correspondence (PiCo) is a method for dense contrastive learning from video.
We validate PiCo on standard benchmarks, outperforming self-supervised baselines on multiple dense prediction tasks.
- Score: 56.24439897867531
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While self-supervised learning has enabled effective representation learning
in the absence of labels, for vision, video remains a relatively untapped
source of supervision. To address this, we propose Pixel-level Correspondence
(PiCo), a method for dense contrastive learning from video. By tracking points
with optical flow, we obtain a correspondence map which can be used to match
local features at different points in time. We validate PiCo on standard
benchmarks, outperforming self-supervised baselines on multiple dense
prediction tasks, without compromising performance on image classification.
Related papers
- Learning Fine-Grained Features for Pixel-wise Video Correspondences [13.456993858078514]
We address the problem of learning features for establishing pixel-wise correspondences.
Motivated by optical flows as well as the self-supervised feature learning, we propose to use not only labeled synthetic videos but also unlabeled real-world videos.
Our experimental results on a series of correspondence-based tasks demonstrate that the proposed method outperforms state-of-the-art rivals in both accuracy and efficiency.
arXiv Detail & Related papers (2023-08-06T07:27:17Z) - Unified Mask Embedding and Correspondence Learning for Self-Supervised
Video Segmentation [76.40565872257709]
We develop a unified framework which simultaneously models cross-frame dense correspondence for locally discriminative feature learning.
It is able to directly learn to perform mask-guided sequential segmentation from unlabeled videos.
Our algorithm sets state-of-the-arts on two standard benchmarks (i.e., DAVIS17 and YouTube-VOS)
arXiv Detail & Related papers (2023-03-17T16:23:36Z) - Contrastive Losses Are Natural Criteria for Unsupervised Video
Summarization [27.312423653997087]
Video summarization aims to select the most informative subset of frames in a video to facilitate efficient video browsing.
We propose three metrics featuring a desirable key frame: local dissimilarity, global consistency, and uniqueness.
We show that by refining the pre-trained features with a lightweight contrastively learned projection module, the frame-level importance scores can be further improved.
arXiv Detail & Related papers (2022-11-18T07:01:28Z) - Self-Supervised Equivariant Learning for Oriented Keypoint Detection [35.94215211409985]
We introduce a self-supervised learning framework using rotation-equivariant CNNs to learn to detect robust oriented keypoints.
We propose a dense orientation alignment loss by an image pair generated by synthetic transformations for training a histogram-based orientation map.
Our method outperforms the previous methods on an image matching benchmark and a camera pose estimation benchmark.
arXiv Detail & Related papers (2022-04-19T02:26:07Z) - LEAD: Self-Supervised Landmark Estimation by Aligning Distributions of
Feature Similarity [49.84167231111667]
Existing works in self-supervised landmark detection are based on learning dense (pixel-level) feature representations from an image.
We introduce an approach to enhance the learning of dense equivariant representations in a self-supervised fashion.
We show that having such a prior in the feature extractor helps in landmark detection, even under drastically limited number of annotations.
arXiv Detail & Related papers (2022-04-06T17:48:18Z) - CoCon: Cooperative-Contrastive Learning [52.342936645996765]
Self-supervised visual representation learning is key for efficient video analysis.
Recent success in learning image representations suggests contrastive learning is a promising framework to tackle this challenge.
We introduce a cooperative variant of contrastive learning to utilize complementary information across views.
arXiv Detail & Related papers (2021-04-30T05:46:02Z) - Multiview Pseudo-Labeling for Semi-supervised Learning from Video [102.36355560553402]
We present a novel framework that uses complementary views in the form of appearance and motion information for semi-supervised learning in video.
Our method capitalizes on multiple views, but it nonetheless trains a model that is shared across appearance and motion input.
On multiple video recognition datasets, our method substantially outperforms its supervised counterpart, and compares favorably to previous work on standard benchmarks in self-supervised video representation learning.
arXiv Detail & Related papers (2021-04-01T17:59:48Z) - Rethinking Self-supervised Correspondence Learning: A Video Frame-level
Similarity Perspective [13.90183404059193]
We propose to learn correspondence using Video Frame-level Similarity (VFS) learning.
Our work is inspired by the recent success in image-level contrastive learning and similarity learning for visual recognition.
Our experiments show surprising results that VFS surpasses state-of-the-art self-supervised approaches for both OTB visual object tracking and DAVIS video object segmentation.
arXiv Detail & Related papers (2021-03-31T17:56:35Z) - Contrastive Transformation for Self-supervised Correspondence Learning [120.62547360463923]
We study the self-supervised learning of visual correspondence using unlabeled videos in the wild.
Our method simultaneously considers intra- and inter-video representation associations for reliable correspondence estimation.
Our framework outperforms the recent self-supervised correspondence methods on a range of visual tasks.
arXiv Detail & Related papers (2020-12-09T14:05:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.