Related papers: Self-supervised Geometric Perception

Self-supervised Geometric Perception

URL: http://arxiv.org/abs/2103.03114v1
Date: Thu, 4 Mar 2021 15:34:43 GMT
Title: Self-supervised Geometric Perception
Authors: Heng Yang, Wei Dong, Luca Carlone, Vladlen Koltun
Abstract summary: Self-supervised geometric perception is a framework to learn a feature descriptor for correspondence matching without any ground-truth geometric model labels. We show that SGP achieves state-of-the-art performance that is on-par or superior to the supervised oracles trained using ground-truth labels.
Score: 96.89966337518854
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present self-supervised geometric perception (SGP), the first general framework to learn a feature descriptor for correspondence matching without any ground-truth geometric model labels (e.g., camera poses, rigid transformations). Our first contribution is to formulate geometric perception as an optimization problem that jointly optimizes the feature descriptor and the geometric models given a large corpus of visual measurements (e.g., images, point clouds). Under this optimization formulation, we show that two important streams of research in vision, namely robust model fitting and deep feature learning, correspond to optimizing one block of the unknown variables while fixing the other block. This analysis naturally leads to our second contribution -- the SGP algorithm that performs alternating minimization to solve the joint optimization. SGP iteratively executes two meta-algorithms: a teacher that performs robust model fitting given learned features to generate geometric pseudo-labels, and a student that performs deep feature learning under noisy supervision of the pseudo-labels. As a third contribution, we apply SGP to two perception problems on large-scale real datasets, namely relative camera pose estimation on MegaDepth and point cloud registration on 3DMatch. We demonstrate that SGP achieves state-of-the-art performance that is on-par or superior to the supervised oracles trained using ground-truth labels.

Related papers

Hi^2-GSLoc: Dual-Hierarchical Gaussian-Specific Visual Relocalization for Remote Sensing [6.997091164331322]
Visual relocalization is fundamental to remote sensing and UAV applications.<n>Existing methods face inherent trade-offs: image-based retrieval and pose regression approaches lack precision.<n>We introduce $mathrmHi2$-GSLoc, a dual-hierarchical relocalization framework that follows a sparse-to-dense and coarse-to-fine paradigm.
arXiv Detail & Related papers (2025-07-21T14:47:56Z)
Topology-Aware Modeling for Unsupervised Simulation-to-Reality Point Cloud Recognition [63.55828203989405]
We introduce a novel Topology-Aware Modeling (TAM) framework for Sim2Real UDA on object point clouds.<n>Our approach mitigates the domain gap by leveraging global spatial topology, characterized by low-level, high-frequency 3D structures.<n>We propose an advanced self-training strategy that combines cross-domain contrastive learning with self-training.
arXiv Detail & Related papers (2025-06-26T11:53:59Z)
GraphGSOcc: Semantic-Geometric Graph Transformer with Dynamic-Static Decoupling for 3D Gaussian Splatting-based Occupancy Prediction [2.3239379129613535]
GraphGSOcc is a novel framework that combines semantic and geometric graph Transformer and decouples dynamic-static objects.<n>It achieves state-ofthe-art performance on the SurroundOcc-nuScenes, Occ3D-nuScenes, OpenOcc and KITTI occupancy benchmarks.
arXiv Detail & Related papers (2025-06-13T06:09:57Z)
UniGeo: Taming Video Diffusion for Unified Consistent Geometry Estimation [63.90470530428842]
In this work, we demonstrate that, through appropriate design and fine-tuning, the intrinsic consistency of video generation models can be effectively harnessed for consistent geometric estimation.<n>Our results achieve superior performance in predicting global geometric attributes in videos and can be directly applied to reconstruction tasks.
arXiv Detail & Related papers (2025-05-30T12:31:59Z)
FLARE: Feed-forward Geometry, Appearance and Camera Estimation from Uncalibrated Sparse Views [93.6881532277553]
We present FLARE, a feed-forward model designed to infer high-quality camera poses and 3D geometry from uncalibrated sparse-view images. Our solution features a cascaded learning paradigm with camera pose serving as the critical bridge, recognizing its essential role in mapping 3D structures onto 2D image planes.
arXiv Detail & Related papers (2025-02-17T18:54:05Z)
Topology-Aware 3D Gaussian Splatting: Leveraging Persistent Homology for Optimized Structural Integrity [3.792470553976718]
This work introduces Topology-Aware 3D Gaussian Splatting (Topology-GS) Topology-GS addresses compromised pixel-level structural integrity due to incomplete initial geometric coverage. Experiments on three novel-view benchmarks demonstrate that Topology-GS outperforms existing methods in terms of PSNR, SSIM, and LPIPS metrics.
arXiv Detail & Related papers (2024-12-21T13:25:03Z)
Str-L Pose: Integrating Point and Structured Line for Relative Pose Estimation in Dual-Graph [45.115555973941255]
Relative pose estimation is crucial for various computer vision applications, including Robotic and Autonomous Driving. We propose a Geometric Correspondence Graph neural network that integrates point features with extra structured line segments. This integration of matched points and line segments further exploits the geometry constraints and enhances model performance across different environments.
arXiv Detail & Related papers (2024-08-28T12:33:26Z)
S^2Former-OR: Single-Stage Bi-Modal Transformer for Scene Graph Generation in OR [50.435592120607815]
Scene graph generation (SGG) of surgical procedures is crucial in enhancing holistically cognitive intelligence in the operating room (OR) Previous works have primarily relied on multi-stage learning, where the generated semantic scene graphs depend on intermediate processes with pose estimation and object detection. In this study, we introduce a novel single-stage bi-modal transformer framework for SGG in the OR, termed S2Former-OR.
arXiv Detail & Related papers (2024-02-22T11:40:49Z)
Iterative Graph Filtering Network for 3D Human Pose Estimation [5.177947445379688]
Graph convolutional networks (GCNs) have proven to be an effective approach for 3D human pose estimation. In this paper, we introduce an iterative graph filtering framework for 3D human pose estimation. Our approach builds upon the idea of iteratively solving graph filtering with Laplacian regularization.
arXiv Detail & Related papers (2023-07-29T20:46:44Z)
Geometric-aware Pretraining for Vision-centric 3D Object Detection [77.7979088689944]
We propose a novel geometric-aware pretraining framework called GAPretrain. GAPretrain serves as a plug-and-play solution that can be flexibly applied to multiple state-of-the-art detectors. We achieve 46.2 mAP and 55.5 NDS on the nuScenes val set using the BEVFormer method, with a gain of 2.7 and 2.1 points, respectively.
arXiv Detail & Related papers (2023-04-06T14:33:05Z)
Geo-SIC: Learning Deformable Geometric Shapes in Deep Image Classifiers [8.781861951759948]
This paper presents Geo-SIC, the first deep learning model to learn deformable shapes in a deformation space for an improved performance of image classification. We introduce a newly designed framework that (i) simultaneously derives features from both image and latent shape spaces with large intra-class variations. We develop a boosted classification network, equipped with an unsupervised learning of geometric shape representations.
arXiv Detail & Related papers (2022-10-25T01:55:17Z)
Ollivier-Ricci Curvature For Head Pose Estimation From a Single Image [10.842428621768667]
This paper aims to estimate head pose from a single image by applying notions of network curvature. In this work, using the geometric notion of Ollivier-Ricci curvature (ORC) on weighted graphs as input to the XGBoost regression model, we show that the intrinsic geometric basis of ORC offers a natural approach.
arXiv Detail & Related papers (2022-04-27T15:20:26Z)
Self-Supervised Image Representation Learning with Geometric Set Consistency [50.12720780102395]
We propose a method for self-supervised image representation learning under the guidance of 3D geometric consistency. Specifically, we introduce 3D geometric consistency into a contrastive learning framework to enforce the feature consistency within image views.
arXiv Detail & Related papers (2022-03-29T08:57:33Z)
NeuroMorph: Unsupervised Shape Interpolation and Correspondence in One Go [109.88509362837475]
We present NeuroMorph, a new neural network architecture that takes as input two 3D shapes. NeuroMorph produces smooth and point-to-point correspondences between them. It works well for a large variety of input shapes, including non-isometric pairs from different object categories.
arXiv Detail & Related papers (2021-06-17T12:25:44Z)
Primal-Dual Mesh Convolutional Neural Networks [62.165239866312334]
We propose a primal-dual framework drawn from the graph-neural-network literature to triangle meshes. Our method takes features for both edges and faces of a 3D mesh as input and dynamically aggregates them. We provide theoretical insights of our approach using tools from the mesh-simplification literature.
arXiv Detail & Related papers (2020-10-23T14:49:02Z)
Monocular 3D Detection with Geometric Constraints Embedding and Semi-supervised Training [3.8073142980733]
We propose a novel framework for monocular 3D objects detection using only RGB images, called KM3D-Net. We design a fully convolutional model to predict object keypoints, dimension, and orientation, and then combine these estimations with perspective geometry constraints to compute position attribute.
arXiv Detail & Related papers (2020-09-02T00:51:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.