Self-supervised Geometric Perception
- URL: http://arxiv.org/abs/2103.03114v1
- Date: Thu, 4 Mar 2021 15:34:43 GMT
- Title: Self-supervised Geometric Perception
- Authors: Heng Yang, Wei Dong, Luca Carlone, Vladlen Koltun
- Abstract summary: Self-supervised geometric perception is a framework to learn a feature descriptor for correspondence matching without any ground-truth geometric model labels.
We show that SGP achieves state-of-the-art performance that is on-par or superior to the supervised oracles trained using ground-truth labels.
- Score: 96.89966337518854
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present self-supervised geometric perception (SGP), the first general
framework to learn a feature descriptor for correspondence matching without any
ground-truth geometric model labels (e.g., camera poses, rigid
transformations). Our first contribution is to formulate geometric perception
as an optimization problem that jointly optimizes the feature descriptor and
the geometric models given a large corpus of visual measurements (e.g., images,
point clouds). Under this optimization formulation, we show that two important
streams of research in vision, namely robust model fitting and deep feature
learning, correspond to optimizing one block of the unknown variables while
fixing the other block. This analysis naturally leads to our second
contribution -- the SGP algorithm that performs alternating minimization to
solve the joint optimization. SGP iteratively executes two meta-algorithms: a
teacher that performs robust model fitting given learned features to generate
geometric pseudo-labels, and a student that performs deep feature learning
under noisy supervision of the pseudo-labels. As a third contribution, we apply
SGP to two perception problems on large-scale real datasets, namely relative
camera pose estimation on MegaDepth and point cloud registration on 3DMatch. We
demonstrate that SGP achieves state-of-the-art performance that is on-par or
superior to the supervised oracles trained using ground-truth labels.
Related papers
- FLARE: Feed-forward Geometry, Appearance and Camera Estimation from Uncalibrated Sparse Views [93.6881532277553]
We present FLARE, a feed-forward model designed to infer high-quality camera poses and 3D geometry from uncalibrated sparse-view images.
Our solution features a cascaded learning paradigm with camera pose serving as the critical bridge, recognizing its essential role in mapping 3D structures onto 2D image planes.
arXiv Detail & Related papers (2025-02-17T18:54:05Z) - Topology-Aware 3D Gaussian Splatting: Leveraging Persistent Homology for Optimized Structural Integrity [3.792470553976718]
This work introduces Topology-Aware 3D Gaussian Splatting (Topology-GS)
Topology-GS addresses compromised pixel-level structural integrity due to incomplete initial geometric coverage.
Experiments on three novel-view benchmarks demonstrate that Topology-GS outperforms existing methods in terms of PSNR, SSIM, and LPIPS metrics.
arXiv Detail & Related papers (2024-12-21T13:25:03Z) - Str-L Pose: Integrating Point and Structured Line for Relative Pose Estimation in Dual-Graph [45.115555973941255]
Relative pose estimation is crucial for various computer vision applications, including Robotic and Autonomous Driving.
We propose a Geometric Correspondence Graph neural network that integrates point features with extra structured line segments.
This integration of matched points and line segments further exploits the geometry constraints and enhances model performance across different environments.
arXiv Detail & Related papers (2024-08-28T12:33:26Z) - S^2Former-OR: Single-Stage Bi-Modal Transformer for Scene Graph Generation in OR [50.435592120607815]
Scene graph generation (SGG) of surgical procedures is crucial in enhancing holistically cognitive intelligence in the operating room (OR)
Previous works have primarily relied on multi-stage learning, where the generated semantic scene graphs depend on intermediate processes with pose estimation and object detection.
In this study, we introduce a novel single-stage bi-modal transformer framework for SGG in the OR, termed S2Former-OR.
arXiv Detail & Related papers (2024-02-22T11:40:49Z) - Iterative Graph Filtering Network for 3D Human Pose Estimation [5.177947445379688]
Graph convolutional networks (GCNs) have proven to be an effective approach for 3D human pose estimation.
In this paper, we introduce an iterative graph filtering framework for 3D human pose estimation.
Our approach builds upon the idea of iteratively solving graph filtering with Laplacian regularization.
arXiv Detail & Related papers (2023-07-29T20:46:44Z) - Geo-SIC: Learning Deformable Geometric Shapes in Deep Image Classifiers [8.781861951759948]
This paper presents Geo-SIC, the first deep learning model to learn deformable shapes in a deformation space for an improved performance of image classification.
We introduce a newly designed framework that (i) simultaneously derives features from both image and latent shape spaces with large intra-class variations.
We develop a boosted classification network, equipped with an unsupervised learning of geometric shape representations.
arXiv Detail & Related papers (2022-10-25T01:55:17Z) - Self-Supervised Image Representation Learning with Geometric Set
Consistency [50.12720780102395]
We propose a method for self-supervised image representation learning under the guidance of 3D geometric consistency.
Specifically, we introduce 3D geometric consistency into a contrastive learning framework to enforce the feature consistency within image views.
arXiv Detail & Related papers (2022-03-29T08:57:33Z) - NeuroMorph: Unsupervised Shape Interpolation and Correspondence in One
Go [109.88509362837475]
We present NeuroMorph, a new neural network architecture that takes as input two 3D shapes.
NeuroMorph produces smooth and point-to-point correspondences between them.
It works well for a large variety of input shapes, including non-isometric pairs from different object categories.
arXiv Detail & Related papers (2021-06-17T12:25:44Z) - Primal-Dual Mesh Convolutional Neural Networks [62.165239866312334]
We propose a primal-dual framework drawn from the graph-neural-network literature to triangle meshes.
Our method takes features for both edges and faces of a 3D mesh as input and dynamically aggregates them.
We provide theoretical insights of our approach using tools from the mesh-simplification literature.
arXiv Detail & Related papers (2020-10-23T14:49:02Z) - Monocular 3D Detection with Geometric Constraints Embedding and
Semi-supervised Training [3.8073142980733]
We propose a novel framework for monocular 3D objects detection using only RGB images, called KM3D-Net.
We design a fully convolutional model to predict object keypoints, dimension, and orientation, and then combine these estimations with perspective geometry constraints to compute position attribute.
arXiv Detail & Related papers (2020-09-02T00:51:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.