Self-Supervised Correspondence Estimation via Multiview Registration
- URL: http://arxiv.org/abs/2212.03236v1
- Date: Tue, 6 Dec 2022 18:59:02 GMT
- Title: Self-Supervised Correspondence Estimation via Multiview Registration
- Authors: Mohamed El Banani, Ignacio Rocco, David Novotny, Andrea Vedaldi,
Natalia Neverova, Justin Johnson, Benjamin Graham
- Abstract summary: Video provides us with the synchronization-temporal consistency needed for visual learning.
Recent approaches have utilized this signal to learn correspondence estimation from close-by frame pairs.
We propose a self-supervised approach for correspondence estimation that learns from multiview consistency in short RGB-D video sequences.
- Score: 88.99287381176094
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video provides us with the spatio-temporal consistency needed for visual
learning. Recent approaches have utilized this signal to learn correspondence
estimation from close-by frame pairs. However, by only relying on close-by
frame pairs, those approaches miss out on the richer long-range consistency
between distant overlapping frames. To address this, we propose a
self-supervised approach for correspondence estimation that learns from
multiview consistency in short RGB-D video sequences. Our approach combines
pairwise correspondence estimation and registration with a novel SE(3)
transformation synchronization algorithm. Our key insight is that
self-supervised multiview registration allows us to obtain correspondences over
longer time frames; increasing both the diversity and difficulty of sampled
pairs. We evaluate our approach on indoor scenes for correspondence estimation
and RGB-D pointcloud registration and find that we perform on-par with
supervised approaches.
Related papers
- Weakly-supervised Representation Learning for Video Alignment and
Analysis [16.80278496414627]
This paper introduces LRProp -- a novel weakly-supervised representation learning approach.
The proposed algorithm uses also a regularized SoftDTW loss for better tuning the learned features.
Our novel representation learning paradigm consistently outperforms the state of the art on temporal alignment tasks.
arXiv Detail & Related papers (2023-02-08T14:01:01Z) - Tracking by Associating Clips [110.08925274049409]
In this paper, we investigate an alternative by treating object association as clip-wise matching.
Our new perspective views a single long video sequence as multiple short clips, and then the tracking is performed both within and between the clips.
The benefits of this new approach are two folds. First, our method is robust to tracking error accumulation or propagation, as the video chunking allows bypassing the interrupted frames.
Second, the multiple frame information is aggregated during the clip-wise matching, resulting in a more accurate long-range track association than the current frame-wise matching.
arXiv Detail & Related papers (2022-12-20T10:33:17Z) - Correspondence Matters for Video Referring Expression Comprehension [64.60046797561455]
Video Referring Expression (REC) aims to localize the referent objects described in the sentence to visual regions in the video frames.
Existing methods suffer from two problems: 1) inconsistent localization results across video frames; 2) confusion between the referent and contextual objects.
We propose a novel Dual Correspondence Network (dubbed as DCNet) which explicitly enhances the dense associations in both the inter-frame and cross-modal manners.
arXiv Detail & Related papers (2022-07-21T10:31:39Z) - Modelling Neighbor Relation in Joint Space-Time Graph for Video
Correspondence Learning [53.74240452117145]
This paper presents a self-supervised method for learning reliable visual correspondence from unlabeled videos.
We formulate the correspondence as finding paths in a joint space-time graph, where nodes are grid patches sampled from frames, and are linked by two types of edges.
Our learned representation outperforms the state-of-the-art self-supervised methods on a variety of visual tasks.
arXiv Detail & Related papers (2021-09-28T05:40:01Z) - Active Annotation of Informative Overlapping Frames in Video Mosaicking
Applications [3.5544725140884936]
We introduce an efficient framework for the active annotation of long-range pairwise correspondences in a sequence.
Our framework suggests pairs of images that are sought to be informative to an oracle agent.
In addition to the efficient construction of a mosaic, our framework provides, as a by-product, ground truth landmark correspondences.
arXiv Detail & Related papers (2020-12-30T22:19:19Z) - Contrastive Transformation for Self-supervised Correspondence Learning [120.62547360463923]
We study the self-supervised learning of visual correspondence using unlabeled videos in the wild.
Our method simultaneously considers intra- and inter-video representation associations for reliable correspondence estimation.
Our framework outperforms the recent self-supervised correspondence methods on a range of visual tasks.
arXiv Detail & Related papers (2020-12-09T14:05:06Z) - Learning multiview 3D point cloud registration [74.39499501822682]
We present a novel, end-to-end learnable, multiview 3D point cloud registration algorithm.
Our approach outperforms the state-of-the-art by a significant margin, while being end-to-end trainable and computationally less costly.
arXiv Detail & Related papers (2020-01-15T03:42:14Z) - Learning and Matching Multi-View Descriptors for Registration of Point
Clouds [48.25586496457587]
We first propose a multi-view local descriptor, which is learned from the images of multiple views, for the description of 3D keypoints.
Then, we develop a robust matching approach, aiming at rejecting outlier matches based on the efficient inference.
We have demonstrated the boost of our approaches to registration on the public scanning and multi-view stereo datasets.
arXiv Detail & Related papers (2018-07-16T01:58:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.