Geometry-guided Feature Learning and Fusion for Indoor Scene Reconstruction
- URL: http://arxiv.org/abs/2408.15608v1
- Date: Wed, 28 Aug 2024 08:02:47 GMT
- Title: Geometry-guided Feature Learning and Fusion for Indoor Scene Reconstruction
- Authors: Ruihong Yin, Sezer Karaoglu, Theo Gevers,
- Abstract summary: This paper proposes a novel geometry integration mechanism for 3D scene reconstruction.
Our approach incorporates 3D geometry at three levels, i.e. feature learning, feature fusion, and network supervision.
- Score: 14.225228781008209
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In addition to color and textural information, geometry provides important cues for 3D scene reconstruction. However, current reconstruction methods only include geometry at the feature level thus not fully exploiting the geometric information. In contrast, this paper proposes a novel geometry integration mechanism for 3D scene reconstruction. Our approach incorporates 3D geometry at three levels, i.e. feature learning, feature fusion, and network supervision. First, geometry-guided feature learning encodes geometric priors to contain view-dependent information. Second, a geometry-guided adaptive feature fusion is introduced which utilizes the geometric priors as a guidance to adaptively generate weights for multiple views. Third, at the supervision level, taking the consistency between 2D and 3D normals into account, a consistent 3D normal loss is designed to add local constraints. Large-scale experiments are conducted on the ScanNet dataset, showing that volumetric methods with our geometry integration mechanism outperform state-of-the-art methods quantitatively as well as qualitatively. Volumetric methods with ours also show good generalization on the 7-Scenes and TUM RGB-D datasets.
Related papers
- MultiGO++: Monocular 3D Clothed Human Reconstruction via Geometry-Texture Collaboration [10.85658775835694]
Monocular 3D clothed human reconstruction aims to generate a complete and realistic textured 3D avatar from a single image.<n>Existing methods are commonly trained under multi-view supervision with annotated geometric priors, and during inference, these priors are estimated by the pre-trained network from the monocular input.<n>We propose a novel reconstruction framework, named MultiGO++, which achieves effective systematic geometry-texture collaboration.
arXiv Detail & Related papers (2026-03-05T09:37:55Z) - Learning Human Visual Attention on 3D Surfaces through Geometry-Queried Semantic Priors [0.0]
We introduce SemGeo-AttentionNet, a dual-stream architecture that formalizes the interplay between geometric processing and semantic recognition.<n>We extend our framework to temporal scanpath generation through reinforcement learning.<n> Evaluation on SAL3D, NUS3D and 3DVA datasets demonstrates substantial improvements.
arXiv Detail & Related papers (2026-02-06T06:15:38Z) - GeoWorld: Unlocking the Potential of Geometry Models to Facilitate High-Fidelity 3D Scene Generation [68.02988074681427]
Previous works leveraging video models for image-to-3D scene generation tend to suffer from geometric distortions and blurry content.<n>In this paper, we renovate the pipeline of image-to-3D scene generation by unlocking the potential of geometry models.<n>Our GeoWorld can generate high-fidelity 3D scenes from a single image and a given camera trajectory, outperforming prior methods both qualitatively and quantitatively.
arXiv Detail & Related papers (2025-11-28T13:55:45Z) - IGGT: Instance-Grounded Geometry Transformer for Semantic 3D Reconstruction [82.53307702809606]
Humans naturally perceive the geometric structure and semantic content of a 3D world as intertwined dimensions.<n>We propose InstanceGrounded Geometry Transformer (IGGT) to unify the knowledge for both spatial reconstruction and instance-level contextual understanding.
arXiv Detail & Related papers (2025-10-26T14:57:44Z) - Seeing 3D Through 2D Lenses: 3D Few-Shot Class-Incremental Learning via Cross-Modal Geometric Rectification [59.17489431187807]
We propose a framework that enhances 3D geometric fidelity by leveraging CLIP's hierarchical spatial semantics.<n>Our method significantly improves 3D few-shot class-incremental learning, achieving superior geometric coherence and robustness to texture bias.
arXiv Detail & Related papers (2025-09-18T13:45:08Z) - Reg3D: Reconstructive Geometry Instruction Tuning for 3D Scene Understanding [6.7958985137291235]
Reg3D is a novel Reconstructive Geometry Instruction Tuning framework that incorporates geometry-aware supervision directly into the training process.<n>Our key insight is that effective 3D understanding requires reconstructing underlying geometric structures rather than merely describing them.<n>Experiments on ScanQA, Scan2Cap, ScanRefer, and SQA3D demonstrate that Reg3D delivers substantial performance improvements.
arXiv Detail & Related papers (2025-09-03T18:36:44Z) - Dens3R: A Foundation Model for 3D Geometry Prediction [44.13431776180547]
Dens3R is a 3D foundation model designed for joint geometric dense prediction.<n>By integrating image-pair matching features with intrinsic invariance modeling, Dens3R accurately regresses multiple geometric quantities.
arXiv Detail & Related papers (2025-07-22T07:22:30Z) - GRACE: Estimating Geometry-level 3D Human-Scene Contact from 2D Images [54.602947113980655]
Estimating the geometry level of human-scene contact aims to ground specific contact surface points at 3D human geometries.<n> GRACE (Geometry-level Reasoning for 3D Human-scene Contact Estimation) is a new paradigm for 3D human contact estimation.<n>It incorporates a point cloud encoder-decoder architecture along with a hierarchical feature extraction and fusion module.
arXiv Detail & Related papers (2025-05-10T09:25:46Z) - Hi3DGen: High-fidelity 3D Geometry Generation from Images via Normal Bridging [15.36983068580743]
Hi3DGen is a novel framework for generating high-fidelity 3D geometry from images via normal bridging.
Our work provides a new direction for high-fidelity 3D geometry generation from images by leveraging normal maps as an intermediate representation.
arXiv Detail & Related papers (2025-03-28T08:39:20Z) - GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation [65.33726478659304]
We introduce the Geometry-Aware Large Reconstruction Model (GeoLRM), an approach which can predict high-quality assets with 512k Gaussians and 21 input images in only 11 GB GPU memory.
Previous works neglect the inherent sparsity of 3D structure and do not utilize explicit geometric relationships between 3D and 2D images.
GeoLRM tackles these issues by incorporating a novel 3D-aware transformer structure that directly processes 3D points and uses deformable cross-attention mechanisms.
arXiv Detail & Related papers (2024-06-21T17:49:31Z) - G-NeRF: Geometry-enhanced Novel View Synthesis from Single-View Images [45.66479596827045]
We propose a Geometry-enhanced NeRF (G-NeRF), which seeks to enhance the geometry priors by a geometry-guided multi-view synthesis approach.
To tackle the absence of multi-view supervision for single-view images, we design the depth-aware training approach.
arXiv Detail & Related papers (2024-04-11T04:58:18Z) - GeoGS3D: Single-view 3D Reconstruction via Geometric-aware Diffusion Model and Gaussian Splatting [81.03553265684184]
We introduce GeoGS3D, a framework for reconstructing detailed 3D objects from single-view images.
We propose a novel metric, Gaussian Divergence Significance (GDS), to prune unnecessary operations during optimization.
Experiments demonstrate that GeoGS3D generates images with high consistency across views and reconstructs high-quality 3D objects.
arXiv Detail & Related papers (2024-03-15T12:24:36Z) - CVRecon: Rethinking 3D Geometric Feature Learning For Neural
Reconstruction [12.53249207602695]
We propose an end-to-end 3D neural reconstruction framework CVRecon.
We exploit the rich geometric embedding in the cost volumes to facilitate 3D geometric feature learning.
arXiv Detail & Related papers (2023-04-28T05:30:19Z) - Single-view 3D Mesh Reconstruction for Seen and Unseen Categories [69.29406107513621]
Single-view 3D Mesh Reconstruction is a fundamental computer vision task that aims at recovering 3D shapes from single-view RGB images.
This paper tackles Single-view 3D Mesh Reconstruction, to study the model generalization on unseen categories.
We propose an end-to-end two-stage network, GenMesh, to break the category boundaries in reconstruction.
arXiv Detail & Related papers (2022-08-04T14:13:35Z) - Depth Completion using Geometry-Aware Embedding [22.333381291860498]
This paper proposes an efficient method to learn geometry-aware embedding.
It encodes the local and global geometric structure information from 3D points, e.g., scene layout, object's sizes and shapes, to guide dense depth estimation.
arXiv Detail & Related papers (2022-03-21T12:06:27Z) - Learning Geometry-Guided Depth via Projective Modeling for Monocular 3D Object Detection [70.71934539556916]
We learn geometry-guided depth estimation with projective modeling to advance monocular 3D object detection.
Specifically, a principled geometry formula with projective modeling of 2D and 3D depth predictions in the monocular 3D object detection network is devised.
Our method remarkably improves the detection performance of the state-of-the-art monocular-based method without extra data by 2.80% on the moderate test setting.
arXiv Detail & Related papers (2021-07-29T12:30:39Z) - Learning Geometry-Disentangled Representation for Complementary
Understanding of 3D Object Point Cloud [50.56461318879761]
We propose Geometry-Disentangled Attention Network (GDANet) for 3D image processing.
GDANet disentangles point clouds into contour and flat part of 3D objects, respectively denoted by sharp and gentle variation components.
Experiments on 3D object classification and segmentation benchmarks demonstrate that GDANet achieves the state-of-the-arts with fewer parameters.
arXiv Detail & Related papers (2020-12-20T13:35:00Z) - RISA-Net: Rotation-Invariant Structure-Aware Network for Fine-Grained 3D
Shape Retrieval [46.02391761751015]
Fine-grained 3D shape retrieval aims to retrieve 3D shapes similar to a query shape in a repository with models belonging to the same class.
We introduce a novel deep architecture, RISA-Net, which learns rotation invariant 3D shape descriptors.
Our method is able to learn the importance of geometric and structural information of all the parts when generating the final compact latent feature of a 3D shape.
arXiv Detail & Related papers (2020-10-02T13:06:12Z) - DSG-Net: Learning Disentangled Structure and Geometry for 3D Shape
Generation [98.96086261213578]
We introduce DSG-Net, a deep neural network that learns a disentangled structured and geometric mesh representation for 3D shapes.
This supports a range of novel shape generation applications with disentangled control, such as of structure (geometry) while keeping geometry (structure) unchanged.
Our method not only supports controllable generation applications but also produces high-quality synthesized shapes, outperforming state-of-the-art methods.
arXiv Detail & Related papers (2020-08-12T17:06:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.