Doctoral Thesis: Geometric Deep Learning For Camera Pose Prediction, Registration, Depth Estimation, and 3D Reconstruction
- URL: http://arxiv.org/abs/2509.01873v1
- Date: Tue, 02 Sep 2025 01:35:44 GMT
- Title: Doctoral Thesis: Geometric Deep Learning For Camera Pose Prediction, Registration, Depth Estimation, and 3D Reconstruction
- Authors: Xueyang Kang,
- Abstract summary: The dissertation provides solutions to the fundamental challenges in 3D vision.<n>It develops geometric deep learning methods tailored for essential tasks such as camera pose estimation, point cloud registration, depth prediction, and 3D reconstruction.<n>It demonstrates their effectiveness across real-world applications such as digital cultural heritage preservation and immersive VR/AR environments.
- Score: 1.8782750537161614
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern deep learning developments create new opportunities for 3D mapping technology, scene reconstruction pipelines, and virtual reality development. Despite advances in 3D deep learning technology, direct training of deep learning models on 3D data faces challenges due to the high dimensionality inherent in 3D data and the scarcity of labeled datasets. Structure-from-motion (SfM) and Simultaneous Localization and Mapping (SLAM) exhibit robust performance when applied to structured indoor environments but often struggle with ambiguous features in unstructured environments. These techniques often struggle to generate detailed geometric representations effective for downstream tasks such as rendering and semantic analysis. Current limitations require the development of 3D representation methods that combine traditional geometric techniques with deep learning capabilities to generate robust geometry-aware deep learning models. The dissertation provides solutions to the fundamental challenges in 3D vision by developing geometric deep learning methods tailored for essential tasks such as camera pose estimation, point cloud registration, depth prediction, and 3D reconstruction. The integration of geometric priors or constraints, such as including depth information, surface normals, and equivariance into deep learning models, enhances both the accuracy and robustness of geometric representations. This study systematically investigates key components of 3D vision, including camera pose estimation, point cloud registration, depth estimation, and high-fidelity 3D reconstruction, demonstrating their effectiveness across real-world applications such as digital cultural heritage preservation and immersive VR/AR environments.
Related papers
- Reg3D: Reconstructive Geometry Instruction Tuning for 3D Scene Understanding [6.7958985137291235]
Reg3D is a novel Reconstructive Geometry Instruction Tuning framework that incorporates geometry-aware supervision directly into the training process.<n>Our key insight is that effective 3D understanding requires reconstructing underlying geometric structures rather than merely describing them.<n>Experiments on ScanQA, Scan2Cap, ScanRefer, and SQA3D demonstrate that Reg3D delivers substantial performance improvements.
arXiv Detail & Related papers (2025-09-03T18:36:44Z) - A Generative Approach to High Fidelity 3D Reconstruction from Text Data [0.0]
This research proposes a fully automated pipeline that seamlessly integrates text-to-image generation, various image processing techniques, and deep learning methods for reflection removal and 3D reconstruction.<n>By leveraging state-of-the-art generative models like Stable Diffusion, the methodology translates natural language inputs into detailed 3D models through a multi-stage workflow.<n>This approach addresses key challenges in generative reconstruction, such as maintaining semantic coherence, managing geometric complexity, and preserving detailed visual information.
arXiv Detail & Related papers (2025-03-05T16:54:15Z) - Back to the Future Cyclopean Stereo: a human perception approach combining deep and geometric constraints [3.336618863186337]
We provide analytical 3D surface models as viewed by a cyclopean eye model.<n>This geometrical foundation combined with learned stereo features allows our system to benefit from the strengths of both approaches.<n>Our approach aims to demonstrate that understanding and modeling geometrical properties of 3D surfaces is beneficial to computer vision research.
arXiv Detail & Related papers (2025-02-28T17:58:20Z) - Textured Mesh Saliency: Bridging Geometry and Texture for Human Perception in 3D Graphics [50.23625950905638]
We present a new dataset for textured mesh saliency, created through an innovative eye-tracking experiment in a six degrees of freedom (6-DOF) VR environment.<n>Our proposed model predicts saliency maps for textured mesh surfaces by treating each triangular face as an individual unit and assigning a saliency density value to reflect the importance of each local surface region.
arXiv Detail & Related papers (2024-12-11T08:27:33Z) - SR-CurvANN: Advancing 3D Surface Reconstruction through Curvature-Aware Neural Networks [0.0]
SR-CurvANN is a novel method that incorporates neural network-based 2D inpainting to effectively reconstruct 3D surfaces.
We show that SR-CurvANN excels in the shape completion process, filling holes with a remarkable level of realism and precision.
arXiv Detail & Related papers (2024-07-25T09:36:37Z) - Robust Geometry-Preserving Depth Estimation Using Differentiable
Rendering [93.94371335579321]
We propose a learning framework that trains models to predict geometry-preserving depth without requiring extra data or annotations.
Comprehensive experiments underscore our framework's superior generalization capabilities.
Our innovative loss functions empower the model to autonomously recover domain-specific scale-and-shift coefficients.
arXiv Detail & Related papers (2023-09-18T12:36:39Z) - R3D3: Dense 3D Reconstruction of Dynamic Scenes from Multiple Cameras [106.52409577316389]
R3D3 is a multi-camera system for dense 3D reconstruction and ego-motion estimation.
Our approach exploits spatial-temporal information from multiple cameras, and monocular depth refinement.
We show that this design enables a dense, consistent 3D reconstruction of challenging, dynamic outdoor environments.
arXiv Detail & Related papers (2023-08-28T17:13:49Z) - CVRecon: Rethinking 3D Geometric Feature Learning For Neural
Reconstruction [12.53249207602695]
We propose an end-to-end 3D neural reconstruction framework CVRecon.
We exploit the rich geometric embedding in the cost volumes to facilitate 3D geometric feature learning.
arXiv Detail & Related papers (2023-04-28T05:30:19Z) - Learning Geometry-Guided Depth via Projective Modeling for Monocular 3D Object Detection [70.71934539556916]
We learn geometry-guided depth estimation with projective modeling to advance monocular 3D object detection.
Specifically, a principled geometry formula with projective modeling of 2D and 3D depth predictions in the monocular 3D object detection network is devised.
Our method remarkably improves the detection performance of the state-of-the-art monocular-based method without extra data by 2.80% on the moderate test setting.
arXiv Detail & Related papers (2021-07-29T12:30:39Z) - Active 3D Shape Reconstruction from Vision and Touch [66.08432412497443]
Humans build 3D understandings of the world through active object exploration, using jointly their senses of vision and touch.
In 3D shape reconstruction, most recent progress has relied on static datasets of limited sensory data such as RGB images, depth maps or haptic readings.
We introduce a system composed of: 1) a haptic simulator leveraging high spatial resolution vision-based tactile sensors for active touching of 3D objects; 2) a mesh-based 3D shape reconstruction model that relies on tactile or visuotactile priors to guide the shape exploration; and 3) a set of data-driven solutions with either tactile or visuo
arXiv Detail & Related papers (2021-07-20T15:56:52Z) - Virtual Normal: Enforcing Geometric Constraints for Accurate and Robust
Depth Prediction [87.08227378010874]
We show the importance of the high-order 3D geometric constraints for depth prediction.
By designing a loss term that enforces a simple geometric constraint, we significantly improve the accuracy and robustness of monocular depth estimation.
We show state-of-the-art results of learning metric depth on NYU Depth-V2 and KITTI.
arXiv Detail & Related papers (2021-03-07T00:08:21Z) - Pix2Surf: Learning Parametric 3D Surface Models of Objects from Images [64.53227129573293]
We investigate the problem of learning to generate 3D parametric surface representations for novel object instances, as seen from one or more views.
We design neural networks capable of generating high-quality parametric 3D surfaces which are consistent between views.
Our method is supervised and trained on a public dataset of shapes from common object categories.
arXiv Detail & Related papers (2020-08-18T06:33:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.