3D-C2FT: Coarse-to-fine Transformer for Multi-view 3D Reconstruction
- URL: http://arxiv.org/abs/2205.14575v1
- Date: Sun, 29 May 2022 06:01:42 GMT
- Title: 3D-C2FT: Coarse-to-fine Transformer for Multi-view 3D Reconstruction
- Authors: Leslie Ching Ow Tiong, Dick Sigmund, Andrew Beng Jin Teoh
- Abstract summary: This paper proposes a new model, namely 3D coarse-to-fine transformer (3D-C2FT), for encoding multi-view features and rectifying defective 3D objects.
C2F attention mechanism enables the model to learn multi-view information flow and synthesize 3D surface correction in a coarse to fine-grained manner.
Experimental results show that 3D-C2FT achieves notable results and outperforms several competing models on these datasets.
- Score: 14.89364490991374
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, the transformer model has been successfully employed for the
multi-view 3D reconstruction problem. However, challenges remain on designing
an attention mechanism to explore the multiview features and exploit their
relations for reinforcing the encoding-decoding modules. This paper proposes a
new model, namely 3D coarse-to-fine transformer (3D-C2FT), by introducing a
novel coarse-to-fine(C2F) attention mechanism for encoding multi-view features
and rectifying defective 3D objects. C2F attention mechanism enables the model
to learn multi-view information flow and synthesize 3D surface correction in a
coarse to fine-grained manner. The proposed model is evaluated by ShapeNet and
Multi-view Real-life datasets. Experimental results show that 3D-C2FT achieves
notable results and outperforms several competing models on these datasets.
Related papers
- Self-augmented Gaussian Splatting with Structure-aware Masks for Sparse-view 3D Reconstruction [9.953394373473621]
Sparse-view 3D reconstruction is a formidable challenge in computer vision.
We propose a self-augmented coarse-to-fine Gaussian splatting paradigm, enhanced with a structure-aware mask.
Our method achieves state-of-the-art performances for sparse input views in both perceptual quality and efficiency.
arXiv Detail & Related papers (2024-08-09T03:09:22Z) - Repeat and Concatenate: 2D to 3D Image Translation with 3D to 3D Generative Modeling [14.341099905684844]
This paper investigates a 2D to 3D image translation method with a straightforward technique, enabling correlated 2D X-ray to 3D CT-like reconstruction.
We observe that existing approaches, which integrate information across multiple 2D views in the latent space lose valuable signal information during latent encoding. Instead, we simply repeat and the 2D views into higher-channel 3D volumes and approach the 3D reconstruction challenge as a straightforward 3D to 3D generative modeling problem.
This method enables the reconstructed 3D volume to retain valuable information from the 2D inputs, which are passed between channel states in a Swin U
arXiv Detail & Related papers (2024-06-26T15:18:20Z) - MVGamba: Unify 3D Content Generation as State Space Sequence Modeling [150.80564081817786]
We introduce MVGamba, a general and lightweight Gaussian reconstruction model featuring a multi-view Gaussian reconstructor.
With off-the-detail multi-view diffusion models integrated, MVGamba unifies 3D generation tasks from a single image, sparse images, or text prompts.
Experiments demonstrate that MVGamba outperforms state-of-the-art baselines in all 3D content generation scenarios with approximately only $0.1times$ of the model size.
arXiv Detail & Related papers (2024-06-10T15:26:48Z) - DiffTF++: 3D-aware Diffusion Transformer for Large-Vocabulary 3D Generation [53.20147419879056]
We introduce a diffusion-based feed-forward framework to address challenges with a single model.
Building upon our 3D-aware Diffusion model with TransFormer, we propose a stronger version for 3D generation, i.e., DiffTF++.
Experiments on ShapeNet and OmniObject3D convincingly demonstrate the effectiveness of our proposed modules.
arXiv Detail & Related papers (2024-05-13T17:59:51Z) - SMPLer: Taming Transformers for Monocular 3D Human Shape and Pose Estimation [74.07836010698801]
We propose an SMPL-based Transformer framework (SMPLer) to address this issue.
SMPLer incorporates two key ingredients: a decoupled attention operation and an SMPL-based target representation.
Extensive experiments demonstrate the effectiveness of SMPLer against existing 3D human shape and pose estimation methods.
arXiv Detail & Related papers (2024-04-23T17:59:59Z) - MVD-Fusion: Single-view 3D via Depth-consistent Multi-view Generation [54.27399121779011]
We present MVD-Fusion: a method for single-view 3D inference via generative modeling of multi-view-consistent RGB-D images.
We show that our approach can yield more accurate synthesis compared to recent state-of-the-art, including distillation-based 3D inference and prior multi-view generation methods.
arXiv Detail & Related papers (2024-04-04T17:59:57Z) - GeoGS3D: Single-view 3D Reconstruction via Geometric-aware Diffusion Model and Gaussian Splatting [81.03553265684184]
We introduce GeoGS3D, a framework for reconstructing detailed 3D objects from single-view images.
We propose a novel metric, Gaussian Divergence Significance (GDS), to prune unnecessary operations during optimization.
Experiments demonstrate that GeoGS3D generates images with high consistency across views and reconstructs high-quality 3D objects.
arXiv Detail & Related papers (2024-03-15T12:24:36Z) - 3D Face Reconstruction Using A Spectral-Based Graph Convolution Encoder [3.749406324648861]
We propose an innovative approach that integrates existing 2D features with 3D features to guide the model learning process.
Our model is trained using 2D-3D data pairs from a combination of datasets and achieves state-of-the-art performance on the NoW benchmark.
arXiv Detail & Related papers (2024-03-08T11:09:46Z) - Large-Vocabulary 3D Diffusion Model with Transformer [57.076986347047]
We introduce a diffusion-based feed-forward framework for synthesizing massive categories of real-world 3D objects with a single generative model.
We propose a novel triplane-based 3D-aware Diffusion model with TransFormer, DiffTF, for handling challenges via three aspects.
Experiments on ShapeNet and OmniObject3D convincingly demonstrate that a single DiffTF model achieves state-of-the-art large-vocabulary 3D object generation performance.
arXiv Detail & Related papers (2023-09-14T17:59:53Z) - PlankAssembly: Robust 3D Reconstruction from Three Orthographic Views
with Learnt Shape Programs [24.09764733540401]
We develop a new method to automatically convert 2D line drawings from three orthographic views into 3D CAD models.
We leverage the attention mechanism in a Transformer-based sequence generation model to learn flexible mappings between the input and output.
Our method significantly outperforms existing ones when the inputs are noisy or incomplete.
arXiv Detail & Related papers (2023-08-10T17:59:34Z) - Multi-view 3D Reconstruction with Transformer [34.756336770583154]
We reformulate the multi-view 3D reconstruction as a sequence-to-sequence prediction problem.
We propose a new framework named 3D Volume Transformer (VolT) for such a task.
Our method achieves a new state-of-the-art accuracy in multi-view reconstruction with fewer parameters.
arXiv Detail & Related papers (2021-03-24T03:14:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.