Structural Multiplane Image: Bridging Neural View Synthesis and 3D
Reconstruction
- URL: http://arxiv.org/abs/2303.05937v1
- Date: Fri, 10 Mar 2023 14:18:40 GMT
- Title: Structural Multiplane Image: Bridging Neural View Synthesis and 3D
Reconstruction
- Authors: Mingfang Zhang, Jinglu Wang, Xiao Li, Yifei Huang, Yoichi Sato, Yan Lu
- Abstract summary: We introduce the Structural MPI (S-MPI), where the plane structure approximates 3D scenes concisely.
Despite the intuition and demand of applying S-MPI, great challenges are introduced, e.g., high-fidelity approximation for both RGBA layers and plane poses.
Our method outperforms both previous state-of-the-art MPI-based view synthesis methods and planar reconstruction methods.
- Score: 39.89856628467095
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Multiplane Image (MPI), containing a set of fronto-parallel RGBA layers,
is an effective and efficient representation for view synthesis from sparse
inputs. Yet, its fixed structure limits the performance, especially for
surfaces imaged at oblique angles. We introduce the Structural MPI (S-MPI),
where the plane structure approximates 3D scenes concisely. Conveying RGBA
contexts with geometrically-faithful structures, the S-MPI directly bridges
view synthesis and 3D reconstruction. It can not only overcome the critical
limitations of MPI, i.e., discretization artifacts from sloped surfaces and
abuse of redundant layers, and can also acquire planar 3D reconstruction.
Despite the intuition and demand of applying S-MPI, great challenges are
introduced, e.g., high-fidelity approximation for both RGBA layers and plane
poses, multi-view consistency, non-planar regions modeling, and efficient
rendering with intersected planes. Accordingly, we propose a transformer-based
network based on a segmentation model. It predicts compact and expressive S-MPI
layers with their corresponding masks, poses, and RGBA contexts. Non-planar
regions are inclusively handled as a special case in our unified framework.
Multi-view consistency is ensured by sharing global proxy embeddings, which
encode plane-level features covering the complete 3D scenes with aligned
coordinates. Intensive experiments show that our method outperforms both
previous state-of-the-art MPI-based view synthesis methods and planar
reconstruction methods.
Related papers
- Large Spatial Model: End-to-end Unposed Images to Semantic 3D [79.94479633598102]
Large Spatial Model (LSM) processes unposed RGB images directly into semantic radiance fields.
LSM simultaneously estimates geometry, appearance, and semantics in a single feed-forward operation.
It can generate versatile label maps by interacting with language at novel viewpoints.
arXiv Detail & Related papers (2024-10-24T17:54:42Z) - 3D Geometric Shape Assembly via Efficient Point Cloud Matching [59.241448711254485]
We introduce Proxy Match Transform (PMT), an approximate high-order feature transform layer that enables reliable matching between mating surfaces of parts.
Building upon PMT, we introduce a new framework, dubbed Proxy Match TransformeR (PMTR), for the geometric assembly task.
We evaluate the proposed PMTR on the large-scale 3D geometric shape assembly benchmark dataset of Breaking Bad.
arXiv Detail & Related papers (2024-07-15T08:50:02Z) - GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation [65.33726478659304]
We introduce the Geometry-Aware Large Reconstruction Model (GeoLRM), an approach which can predict high-quality assets with 512k Gaussians and 21 input images in only 11 GB GPU memory.
Previous works neglect the inherent sparsity of 3D structure and do not utilize explicit geometric relationships between 3D and 2D images.
GeoLRM tackles these issues by incorporating a novel 3D-aware transformer structure that directly processes 3D points and uses deformable cross-attention mechanisms.
arXiv Detail & Related papers (2024-06-21T17:49:31Z) - SAGS: Structure-Aware 3D Gaussian Splatting [53.6730827668389]
We propose a structure-aware Gaussian Splatting method (SAGS) that implicitly encodes the geometry of the scene.
SAGS reflects to state-of-the-art rendering performance and reduced storage requirements on benchmark novel-view synthesis datasets.
arXiv Detail & Related papers (2024-04-29T23:26:30Z) - Neural 3D Scene Reconstruction with the Manhattan-world Assumption [58.90559966227361]
This paper addresses the challenge of reconstructing 3D indoor scenes from multi-view images.
Planar constraints can be conveniently integrated into the recent implicit neural representation-based reconstruction methods.
The proposed method outperforms previous methods by a large margin on 3D reconstruction quality.
arXiv Detail & Related papers (2022-05-05T17:59:55Z) - Implicit Neural Deformation for Multi-View Face Reconstruction [43.88676778013593]
We present a new method for 3D face reconstruction from multi-view RGB images.
Unlike previous methods which are built upon 3D morphable models, our method leverages an implicit representation to encode rich geometric features.
Our experimental results on several benchmark datasets demonstrate that our approach outperforms alternative baselines and achieves superior face reconstruction results compared to state-of-the-art methods.
arXiv Detail & Related papers (2021-12-05T07:02:53Z) - OctField: Hierarchical Implicit Functions for 3D Modeling [18.488778913029805]
We present a learnable hierarchical implicit representation for 3D surfaces, coded OctField, that allows high-precision encoding of intricate surfaces with low memory and computational budget.
We achieve this goal by introducing a hierarchical octree structure to adaptively subdivide the 3D space according to the surface occupancy and the richness of part geometry.
arXiv Detail & Related papers (2021-11-01T16:29:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.