Self-supervised Geometric Perception
- URL: http://arxiv.org/abs/2103.03114v1
- Date: Thu, 4 Mar 2021 15:34:43 GMT
- Title: Self-supervised Geometric Perception
- Authors: Heng Yang, Wei Dong, Luca Carlone, Vladlen Koltun
- Abstract summary: Self-supervised geometric perception is a framework to learn a feature descriptor for correspondence matching without any ground-truth geometric model labels.
We show that SGP achieves state-of-the-art performance that is on-par or superior to the supervised oracles trained using ground-truth labels.
- Score: 96.89966337518854
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present self-supervised geometric perception (SGP), the first general
framework to learn a feature descriptor for correspondence matching without any
ground-truth geometric model labels (e.g., camera poses, rigid
transformations). Our first contribution is to formulate geometric perception
as an optimization problem that jointly optimizes the feature descriptor and
the geometric models given a large corpus of visual measurements (e.g., images,
point clouds). Under this optimization formulation, we show that two important
streams of research in vision, namely robust model fitting and deep feature
learning, correspond to optimizing one block of the unknown variables while
fixing the other block. This analysis naturally leads to our second
contribution -- the SGP algorithm that performs alternating minimization to
solve the joint optimization. SGP iteratively executes two meta-algorithms: a
teacher that performs robust model fitting given learned features to generate
geometric pseudo-labels, and a student that performs deep feature learning
under noisy supervision of the pseudo-labels. As a third contribution, we apply
SGP to two perception problems on large-scale real datasets, namely relative
camera pose estimation on MegaDepth and point cloud registration on 3DMatch. We
demonstrate that SGP achieves state-of-the-art performance that is on-par or
superior to the supervised oracles trained using ground-truth labels.
Related papers
- Str-L Pose: Integrating Point and Structured Line for Relative Pose Estimation in Dual-Graph [45.115555973941255]
Relative pose estimation is crucial for various computer vision applications, including Robotic and Autonomous Driving.
We propose a Geometric Correspondence Graph neural network that integrates point features with extra structured line segments.
This integration of matched points and line segments further exploits the geometry constraints and enhances model performance across different environments.
arXiv Detail & Related papers (2024-08-28T12:33:26Z) - S^2Former-OR: Single-Stage Bi-Modal Transformer for Scene Graph Generation in OR [50.435592120607815]
Scene graph generation (SGG) of surgical procedures is crucial in enhancing holistically cognitive intelligence in the operating room (OR)
Previous works have primarily relied on multi-stage learning, where the generated semantic scene graphs depend on intermediate processes with pose estimation and object detection.
In this study, we introduce a novel single-stage bi-modal transformer framework for SGG in the OR, termed S2Former-OR.
arXiv Detail & Related papers (2024-02-22T11:40:49Z) - Iterative Graph Filtering Network for 3D Human Pose Estimation [5.177947445379688]
Graph convolutional networks (GCNs) have proven to be an effective approach for 3D human pose estimation.
In this paper, we introduce an iterative graph filtering framework for 3D human pose estimation.
Our approach builds upon the idea of iteratively solving graph filtering with Laplacian regularization.
arXiv Detail & Related papers (2023-07-29T20:46:44Z) - Geometric-aware Pretraining for Vision-centric 3D Object Detection [77.7979088689944]
We propose a novel geometric-aware pretraining framework called GAPretrain.
GAPretrain serves as a plug-and-play solution that can be flexibly applied to multiple state-of-the-art detectors.
We achieve 46.2 mAP and 55.5 NDS on the nuScenes val set using the BEVFormer method, with a gain of 2.7 and 2.1 points, respectively.
arXiv Detail & Related papers (2023-04-06T14:33:05Z) - Geo-SIC: Learning Deformable Geometric Shapes in Deep Image Classifiers [8.781861951759948]
This paper presents Geo-SIC, the first deep learning model to learn deformable shapes in a deformation space for an improved performance of image classification.
We introduce a newly designed framework that (i) simultaneously derives features from both image and latent shape spaces with large intra-class variations.
We develop a boosted classification network, equipped with an unsupervised learning of geometric shape representations.
arXiv Detail & Related papers (2022-10-25T01:55:17Z) - Ollivier-Ricci Curvature For Head Pose Estimation From a Single Image [10.842428621768667]
This paper aims to estimate head pose from a single image by applying notions of network curvature.
In this work, using the geometric notion of Ollivier-Ricci curvature (ORC) on weighted graphs as input to the XGBoost regression model, we show that the intrinsic geometric basis of ORC offers a natural approach.
arXiv Detail & Related papers (2022-04-27T15:20:26Z) - Self-Supervised Image Representation Learning with Geometric Set
Consistency [50.12720780102395]
We propose a method for self-supervised image representation learning under the guidance of 3D geometric consistency.
Specifically, we introduce 3D geometric consistency into a contrastive learning framework to enforce the feature consistency within image views.
arXiv Detail & Related papers (2022-03-29T08:57:33Z) - NeuroMorph: Unsupervised Shape Interpolation and Correspondence in One
Go [109.88509362837475]
We present NeuroMorph, a new neural network architecture that takes as input two 3D shapes.
NeuroMorph produces smooth and point-to-point correspondences between them.
It works well for a large variety of input shapes, including non-isometric pairs from different object categories.
arXiv Detail & Related papers (2021-06-17T12:25:44Z) - Primal-Dual Mesh Convolutional Neural Networks [62.165239866312334]
We propose a primal-dual framework drawn from the graph-neural-network literature to triangle meshes.
Our method takes features for both edges and faces of a 3D mesh as input and dynamically aggregates them.
We provide theoretical insights of our approach using tools from the mesh-simplification literature.
arXiv Detail & Related papers (2020-10-23T14:49:02Z) - Monocular 3D Detection with Geometric Constraints Embedding and
Semi-supervised Training [3.8073142980733]
We propose a novel framework for monocular 3D objects detection using only RGB images, called KM3D-Net.
We design a fully convolutional model to predict object keypoints, dimension, and orientation, and then combine these estimations with perspective geometry constraints to compute position attribute.
arXiv Detail & Related papers (2020-09-02T00:51:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.