Image-Plane Geometric Decoding for View-Invariant Indoor Scene Reconstruction
- URL: http://arxiv.org/abs/2509.25744v2
- Date: Mon, 27 Oct 2025 04:03:38 GMT
- Title: Image-Plane Geometric Decoding for View-Invariant Indoor Scene Reconstruction
- Authors: Mingyang Li, Yimeng Fan, Changsong Liu, Lixue Xu, Xin Wang, Yanyan Liu, Wei Zhang,
- Abstract summary: Volume-based indoor scene reconstruction methods offer superior generalization capability and real-time deployment potential.<n>Existing methods rely on multi-view pixel back-projection ray intersections as weak geometric constraints to determine spatial positions.<n>We propose an image-plane decoding framework with three core components: Pixel-level Confidence, Affine Compensation Module, and Image-Plane Spatial Decoder.
- Score: 14.657247288761438
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Volume-based indoor scene reconstruction methods offer superior generalization capability and real-time deployment potential. However, existing methods rely on multi-view pixel back-projection ray intersections as weak geometric constraints to determine spatial positions. This dependence results in reconstruction quality being heavily influenced by input view density. Performance degrades in overlapping regions and unobserved areas.To address these limitations, we reduce dependency on inter-view geometric constraints by exploiting spatial information within individual views. We propose an image-plane decoding framework with three core components: Pixel-level Confidence Encoder, Affine Compensation Module, and Image-Plane Spatial Decoder. These modules decode three-dimensional structural information encoded in images through physical imaging processes. The framework effectively preserves spatial geometric features including edges, hollow structures, and complex textures. It significantly enhances view-invariant reconstruction.Experiments on indoor scene reconstruction datasets confirm superior reconstruction stability. Our method maintains nearly identical quality when view count reduces by 40%. It achieves a coefficient of variation of 0.24%, performance retention rate of 99.7%, and maximum performance drop of 0.42%. These results demonstrate that exploiting intra-view spatial information provides a robust solution for view-limited scenarios in practical applications.
Related papers
- NOVA3R: Non-pixel-aligned Visual Transformer for Amodal 3D Reconstruction [99.52487968452198]
NOVA3R is an effective approach for non-pixel-aligned 3D reconstruction from a set of unposed images in a feed-forward manner.<n>It produces physically plausible geometry with fewer duplicated structures in overlapping regions.<n>It outperforms state-of-the-art methods in terms of reconstruction accuracy and completeness.
arXiv Detail & Related papers (2026-03-04T15:36:25Z) - Deep Equilibrium Convolutional Sparse Coding for Hyperspectral Image Denoising [16.405355853358202]
Hyperspectral images (HSIs) play a crucial role in remote sensing but are often degraded by complex noise patterns.<n> Ensuring the physical property of the denoised HSIs is vital for robust HSI denoising, giving the rise of deep unfolding-based methods.<n>We propose a Deep Equilibrium Convolutional Sparse Coding (DECSC) framework that unifies local spatial-spectral correlations, nonlocal spatial self-similarities, and global spatial consistency.
arXiv Detail & Related papers (2025-08-21T13:35:11Z) - SparseRecon: Neural Implicit Surface Reconstruction from Sparse Views with Feature and Depth Consistencies [48.99420012507374]
We propose SparseRecon, a novel neural implicit reconstruction method for sparse views with volume rendering-based feature consistency and uncertainty-guided depth constraint.<n>We show that our method outperforms the state-of-the-art methods, which can produce high-quality geometry with sparse-view input.
arXiv Detail & Related papers (2025-08-01T06:51:32Z) - Boosting Multi-View Indoor 3D Object Detection via Adaptive 3D Volume Construction [10.569056109735735]
This work presents SGCDet, a novel multi-view indoor 3D object detection framework based on adaptive 3D volume construction.<n>We introduce a geometry and context aware aggregation module to integrate geometric and contextual information within adaptive regions in each image.<n>We show that SGCDet achieves state-of-the-art performance on the ScanNet, ScanNet200 and ARKitScenes datasets.
arXiv Detail & Related papers (2025-07-24T11:58:01Z) - HiNeuS: High-fidelity Neural Surface Mitigating Low-texture and Reflective Ambiguity [8.74691272469226]
HiNeuS is a unified framework that holistically addresses three core limitations in existing approaches.<n>We introduce: 1) Differential visibility verification through SDF-guided ray tracing; 2) Planar-conformal regularization via ray-aligned geometry patches; and 3) Physically-grounded Eikonal relaxation that dynamically modulates geometric constraints based on local gradients.
arXiv Detail & Related papers (2025-06-30T13:45:25Z) - Structure-Preserving Patch Decoding for Efficient Neural Video Representation [0.0]
We propose a neural video representation method based on Structure-Preserving Patches (SPPs)<n>Our method separates each video frame into patch images of spatially aligned frames through a deterministic pixel-based splitting.<n>We train the decoder to reconstruct these structured patches, enabling a global-to-local decoding strategy.
arXiv Detail & Related papers (2025-06-15T15:58:23Z) - 3D Occupancy Prediction with Low-Resolution Queries via Prototype-aware View Transformation [16.69186493462387]
We introduce ProtoOcc, a novel occupancy network that leverages prototypes of clustered image segments in view transformation to enhance low-resolution context.<n>In particular, the mapping of 2D prototypes onto 3D voxel queries encodes high-level visual geometries and complements the loss of spatial information from reduced query resolutions.<n>We show that ProtoOcc achieves competitive performance against the baselines even with 75% reduced voxel resolution.
arXiv Detail & Related papers (2025-03-19T13:14:57Z) - Decompositional Neural Scene Reconstruction with Generative Diffusion Prior [64.71091831762214]
Decompositional reconstruction of 3D scenes, with complete shapes and detailed texture, is intriguing for downstream applications.<n>Recent approaches incorporate semantic or geometric regularization to address this issue, but they suffer significant degradation in underconstrained areas.<n>We propose DP-Recon, which employs diffusion priors in the form of Score Distillation Sampling (SDS) to optimize the neural representation of each individual object under novel views.
arXiv Detail & Related papers (2025-03-19T02:11:31Z) - 360Recon: An Accurate Reconstruction Method Based on Depth Fusion from 360 Images [10.564434148892362]
360-degree images offer a significantly wider field of view compared to traditional pinhole cameras.<n>This makes them crucial for applications in VR, AR, and related fields.<n>We propose 360Recon, an innovative MVS algorithm for ERP images.
arXiv Detail & Related papers (2024-11-28T12:30:45Z) - Efficient Visual State Space Model for Image Deblurring [99.54894198086852]
Convolutional neural networks (CNNs) and Vision Transformers (ViTs) have achieved excellent performance in image restoration.<n>We propose a simple yet effective visual state space model (EVSSM) for image deblurring.<n>The proposed EVSSM performs favorably against state-of-the-art methods on benchmark datasets and real-world images.
arXiv Detail & Related papers (2024-05-23T09:13:36Z) - GEOcc: Geometrically Enhanced 3D Occupancy Network with Implicit-Explicit Depth Fusion and Contextual Self-Supervision [49.839374549646884]
This paper presents GEOcc, a Geometric-Enhanced Occupancy network tailored for vision-only surround-view perception.<n>Our approach achieves State-Of-The-Art performance on the Occ3D-nuScenes dataset with the least image resolution needed and the most weightless image backbone.
arXiv Detail & Related papers (2024-05-17T07:31:20Z) - Variable Radiance Field for Real-World Category-Specific Reconstruction from Single Image [25.44715538841181]
Reconstructing category-specific objects using Neural Radiance Field (NeRF) from a single image is a promising yet challenging task.<n>We propose Variable Radiance Field (VRF), a novel framework capable of efficiently reconstructing category-specific objects.<n>VRF achieves state-of-the-art performance in both reconstruction quality and computational efficiency.
arXiv Detail & Related papers (2023-06-08T12:12:02Z) - Searching a Compact Architecture for Robust Multi-Exposure Image Fusion [55.37210629454589]
Two major stumbling blocks hinder the development, including pixel misalignment and inefficient inference.
This study introduces an architecture search-based paradigm incorporating self-alignment and detail repletion modules for robust multi-exposure image fusion.
The proposed method outperforms various competitive schemes, achieving a noteworthy 3.19% improvement in PSNR for general scenarios and an impressive 23.5% enhancement in misaligned scenarios.
arXiv Detail & Related papers (2023-05-20T17:01:52Z) - Rank-Enhanced Low-Dimensional Convolution Set for Hyperspectral Image
Denoising [50.039949798156826]
This paper tackles the challenging problem of hyperspectral (HS) image denoising.
We propose rank-enhanced low-dimensional convolution set (Re-ConvSet)
We then incorporate Re-ConvSet into the widely-used U-Net architecture to construct an HS image denoising method.
arXiv Detail & Related papers (2022-07-09T13:35:12Z) - Learning Enriched Features for Fast Image Restoration and Enhancement [166.17296369600774]
This paper presents a holistic goal of maintaining spatially-precise high-resolution representations through the entire network.
We learn an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
Our approach achieves state-of-the-art results for a variety of image processing tasks, including defocus deblurring, image denoising, super-resolution, and image enhancement.
arXiv Detail & Related papers (2022-04-19T17:59:45Z) - Spatially-Adaptive Image Restoration using Distortion-Guided Networks [51.89245800461537]
We present a learning-based solution for restoring images suffering from spatially-varying degradations.
We propose SPAIR, a network design that harnesses distortion-localization information and dynamically adjusts to difficult regions in the image.
arXiv Detail & Related papers (2021-08-19T11:02:25Z) - Deep 3D Capture: Geometry and Reflectance from Sparse Multi-View Images [59.906948203578544]
We introduce a novel learning-based method to reconstruct the high-quality geometry and complex, spatially-varying BRDF of an arbitrary object.
We first estimate per-view depth maps using a deep multi-view stereo network.
These depth maps are used to coarsely align the different views.
We propose a novel multi-view reflectance estimation network architecture.
arXiv Detail & Related papers (2020-03-27T21:28:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.