MGP-KAD: Multimodal Geometric Priors and Kolmogorov-Arnold Decoder for Single-View 3D Reconstruction in Complex Scenes
- URL: http://arxiv.org/abs/2602.06158v1
- Date: Thu, 05 Feb 2026 19:54:30 GMT
- Title: MGP-KAD: Multimodal Geometric Priors and Kolmogorov-Arnold Decoder for Single-View 3D Reconstruction in Complex Scenes
- Authors: Luoxi Zhang, Chun Xie, Itaru Kitahara,
- Abstract summary: Single-view 3D reconstruction in complex real-world scenes is challenging due to noise, object diversity, and limited dataset availability.<n>We propose MGP-KAD, a novel multimodal feature fusion framework that integrates RGB and geometric prior to enhance reconstruction accuracy.
- Score: 0.3823356975862005
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Single-view 3D reconstruction in complex real-world scenes is challenging due to noise, object diversity, and limited dataset availability. To address these challenges, we propose MGP-KAD, a novel multimodal feature fusion framework that integrates RGB and geometric prior to enhance reconstruction accuracy. The geometric prior is generated by sampling and clustering ground-truth object data, producing class-level features that dynamically adjust during training to improve geometric understanding. Additionally, we introduce a hybrid decoder based on Kolmogorov-Arnold Networks (KAN) to overcome the limitations of traditional linear decoders in processing complex multimodal inputs. Extensive experiments on the Pix3D dataset demonstrate that MGP-KAD achieves state-of-the-art (SOTA) performance, significantly improving geometric integrity, smoothness, and detail preservation. Our work provides a robust and effective solution for advancing single-view 3D reconstruction in complex scenes.
Related papers
- MultiGO++: Monocular 3D Clothed Human Reconstruction via Geometry-Texture Collaboration [10.85658775835694]
Monocular 3D clothed human reconstruction aims to generate a complete and realistic textured 3D avatar from a single image.<n>Existing methods are commonly trained under multi-view supervision with annotated geometric priors, and during inference, these priors are estimated by the pre-trained network from the monocular input.<n>We propose a novel reconstruction framework, named MultiGO++, which achieves effective systematic geometry-texture collaboration.
arXiv Detail & Related papers (2026-03-05T09:37:55Z) - MoRE: 3D Visual Geometry Reconstruction Meets Mixture-of-Experts [50.37005070020306]
MoRE is a dense 3D visual foundation model based on a Mixture-of-Experts (MoE) architecture.<n>MoRE incorporates a confidence-based depth refinement module that stabilizes and refines geometric estimation.<n>It integrates dense semantic features with globally aligned 3D backbone representations for high-fidelity surface normal prediction.
arXiv Detail & Related papers (2025-10-31T06:54:27Z) - IGGT: Instance-Grounded Geometry Transformer for Semantic 3D Reconstruction [82.53307702809606]
Humans naturally perceive the geometric structure and semantic content of a 3D world as intertwined dimensions.<n>We propose InstanceGrounded Geometry Transformer (IGGT) to unify the knowledge for both spatial reconstruction and instance-level contextual understanding.
arXiv Detail & Related papers (2025-10-26T14:57:44Z) - Review of Feed-forward 3D Reconstruction: From DUSt3R to VGGT [10.984522161856955]
3D reconstruction is a cornerstone technology for numerous applications, including augmented/virtual reality, autonomous driving, and robotics.<n>Deep learning has catalyzed a paradigm shift in 3D reconstruction.<n>New models employ a unified deep network to jointly infer camera poses and dense geometry directly from an Unconstrained set of images in a single forward pass.
arXiv Detail & Related papers (2025-07-11T09:41:54Z) - DGS-LRM: Real-Time Deformable 3D Gaussian Reconstruction From Monocular Videos [52.46386528202226]
We introduce the Deformable Gaussian Splats Large Reconstruction Model (DGS-LRM)<n>It is the first feed-forward method predicting deformable 3D Gaussian splats from a monocular posed video of any dynamic scene.<n>It achieves performance on par with state-of-the-art monocular video 3D tracking methods.
arXiv Detail & Related papers (2025-06-11T17:59:58Z) - GTR: Gaussian Splatting Tracking and Reconstruction of Unknown Objects Based on Appearance and Geometric Complexity [49.31257173003408]
We present a novel method for 6-DoF object tracking and high-quality 3D reconstruction from monocular RGBD video.<n>Our approach demonstrates strong capabilities in recovering high-fidelity object meshes, setting a new standard for single-sensor 3D reconstruction in open-world environments.
arXiv Detail & Related papers (2025-05-17T08:46:29Z) - Learning Multi-frame and Monocular Prior for Estimating Geometry in Dynamic Scenes [56.936178608296906]
We present a new model, coined MMP, to estimate the geometry in a feed-forward manner.<n>Based on the recent Siamese architecture, we introduce a new trajectory encoding module.<n>We find MMP can achieve state-of-the-art quality in feed-forward pointmap prediction.
arXiv Detail & Related papers (2025-05-03T08:28:15Z) - Mono3R: Exploiting Monocular Cues for Geometric 3D Reconstruction [11.220655907305515]
We introduce a monocular-guided refinement module that integrates monocular geometric priors into multi-view reconstruction frameworks.<n>Our method achieves substantial improvements in both mutli-view camera pose estimation and point cloud accuracy.
arXiv Detail & Related papers (2025-04-18T02:33:12Z) - Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View [45.43074998299703]
Niagara is a new single-view 3D scene reconstruction framework that can faithfully reconstruct challenging outdoor scenes from a single input image for the first time.<n>We introduce a geometric affine field (GAF) and 3D self-attention as geometry-constraint, which combines the structural properties of explicit geometry with the adaptability of implicit feature fields.<n>Our framework finally proposes a specialized encoder-decoder architecture, where a depth-based 3D Gaussian decoder is proposed to predict 3D Gaussian parameters.
arXiv Detail & Related papers (2025-03-16T15:50:18Z) - Total-Decom: Decomposed 3D Scene Reconstruction with Minimal Interaction [51.3632308129838]
We present Total-Decom, a novel method for decomposed 3D reconstruction with minimal human interaction.
Our approach seamlessly integrates the Segment Anything Model (SAM) with hybrid implicit-explicit neural surface representations and a mesh-based region-growing technique for accurate 3D object decomposition.
We extensively evaluate our method on benchmark datasets and demonstrate its potential for downstream applications, such as animation and scene editing.
arXiv Detail & Related papers (2024-03-28T11:12:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.