3D-C2FT: Coarse-to-fine Transformer for Multi-view 3D Reconstruction
- URL: http://arxiv.org/abs/2205.14575v1
- Date: Sun, 29 May 2022 06:01:42 GMT
- Title: 3D-C2FT: Coarse-to-fine Transformer for Multi-view 3D Reconstruction
- Authors: Leslie Ching Ow Tiong, Dick Sigmund, Andrew Beng Jin Teoh
- Abstract summary: This paper proposes a new model, namely 3D coarse-to-fine transformer (3D-C2FT), for encoding multi-view features and rectifying defective 3D objects.
C2F attention mechanism enables the model to learn multi-view information flow and synthesize 3D surface correction in a coarse to fine-grained manner.
Experimental results show that 3D-C2FT achieves notable results and outperforms several competing models on these datasets.
- Score: 14.89364490991374
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, the transformer model has been successfully employed for the
multi-view 3D reconstruction problem. However, challenges remain on designing
an attention mechanism to explore the multiview features and exploit their
relations for reinforcing the encoding-decoding modules. This paper proposes a
new model, namely 3D coarse-to-fine transformer (3D-C2FT), by introducing a
novel coarse-to-fine(C2F) attention mechanism for encoding multi-view features
and rectifying defective 3D objects. C2F attention mechanism enables the model
to learn multi-view information flow and synthesize 3D surface correction in a
coarse to fine-grained manner. The proposed model is evaluated by ShapeNet and
Multi-view Real-life datasets. Experimental results show that 3D-C2FT achieves
notable results and outperforms several competing models on these datasets.
Related papers
- View Transformation Robustness for Multi-View 3D Object Reconstruction with Reconstruction Error-Guided View Selection [19.07686691657438]
view transformation robustness (VTR) is critical for deep-learning-based multi-view 3D object reconstruction models.
We propose a reconstruction error-guided view selection method, which considers the reconstruction errors' spatial distribution of the 3D predictions.
The proposed method can outperform state-of-the-art 3D reconstruction methods and other view transformation robustness comparison methods.
arXiv Detail & Related papers (2024-12-16T03:54:08Z) - GEAL: Generalizable 3D Affordance Learning with Cross-Modal Consistency [50.11520458252128]
Existing 3D affordance learning methods struggle with generalization and robustness due to limited annotated data.
We propose GEAL, a novel framework designed to enhance the generalization and robustness of 3D affordance learning by leveraging large-scale pre-trained 2D models.
GEAL consistently outperforms existing methods across seen and novel object categories, as well as corrupted data.
arXiv Detail & Related papers (2024-12-12T17:59:03Z) - DSplats: 3D Generation by Denoising Splats-Based Multiview Diffusion Models [67.50989119438508]
We introduce DSplats, a novel method that directly denoises multiview images using Gaussian-based Reconstructors to produce realistic 3D assets.
Our experiments demonstrate that DSplats not only produces high-quality, spatially consistent outputs, but also sets a new standard in single-image to 3D reconstruction.
arXiv Detail & Related papers (2024-12-11T07:32:17Z) - MVBoost: Boost 3D Reconstruction with Multi-View Refinement [41.46372172076206]
The scarcity of diverse 3D datasets results in limited generalization capabilities of 3D reconstruction models.
We propose a novel framework for boosting 3D reconstruction with multi-view refinement (MVBoost) by generating pseudo-GT data.
arXiv Detail & Related papers (2024-11-26T08:55:20Z) - Repeat and Concatenate: 2D to 3D Image Translation with 3D to 3D Generative Modeling [14.341099905684844]
This paper investigates a 2D to 3D image translation method with a straightforward technique, enabling correlated 2D X-ray to 3D CT-like reconstruction.
We observe that existing approaches, which integrate information across multiple 2D views in the latent space lose valuable signal information during latent encoding. Instead, we simply repeat and the 2D views into higher-channel 3D volumes and approach the 3D reconstruction challenge as a straightforward 3D to 3D generative modeling problem.
This method enables the reconstructed 3D volume to retain valuable information from the 2D inputs, which are passed between channel states in a Swin U
arXiv Detail & Related papers (2024-06-26T15:18:20Z) - MVGamba: Unify 3D Content Generation as State Space Sequence Modeling [150.80564081817786]
We introduce MVGamba, a general and lightweight Gaussian reconstruction model featuring a multi-view Gaussian reconstructor.
With off-the-detail multi-view diffusion models integrated, MVGamba unifies 3D generation tasks from a single image, sparse images, or text prompts.
Experiments demonstrate that MVGamba outperforms state-of-the-art baselines in all 3D content generation scenarios with approximately only $0.1times$ of the model size.
arXiv Detail & Related papers (2024-06-10T15:26:48Z) - DiffTF++: 3D-aware Diffusion Transformer for Large-Vocabulary 3D Generation [53.20147419879056]
We introduce a diffusion-based feed-forward framework to address challenges with a single model.
Building upon our 3D-aware Diffusion model with TransFormer, we propose a stronger version for 3D generation, i.e., DiffTF++.
Experiments on ShapeNet and OmniObject3D convincingly demonstrate the effectiveness of our proposed modules.
arXiv Detail & Related papers (2024-05-13T17:59:51Z) - MVD-Fusion: Single-view 3D via Depth-consistent Multi-view Generation [54.27399121779011]
We present MVD-Fusion: a method for single-view 3D inference via generative modeling of multi-view-consistent RGB-D images.
We show that our approach can yield more accurate synthesis compared to recent state-of-the-art, including distillation-based 3D inference and prior multi-view generation methods.
arXiv Detail & Related papers (2024-04-04T17:59:57Z) - 3D Face Reconstruction Using A Spectral-Based Graph Convolution Encoder [3.749406324648861]
We propose an innovative approach that integrates existing 2D features with 3D features to guide the model learning process.
Our model is trained using 2D-3D data pairs from a combination of datasets and achieves state-of-the-art performance on the NoW benchmark.
arXiv Detail & Related papers (2024-03-08T11:09:46Z) - Large-Vocabulary 3D Diffusion Model with Transformer [57.076986347047]
We introduce a diffusion-based feed-forward framework for synthesizing massive categories of real-world 3D objects with a single generative model.
We propose a novel triplane-based 3D-aware Diffusion model with TransFormer, DiffTF, for handling challenges via three aspects.
Experiments on ShapeNet and OmniObject3D convincingly demonstrate that a single DiffTF model achieves state-of-the-art large-vocabulary 3D object generation performance.
arXiv Detail & Related papers (2023-09-14T17:59:53Z) - PlankAssembly: Robust 3D Reconstruction from Three Orthographic Views
with Learnt Shape Programs [24.09764733540401]
We develop a new method to automatically convert 2D line drawings from three orthographic views into 3D CAD models.
We leverage the attention mechanism in a Transformer-based sequence generation model to learn flexible mappings between the input and output.
Our method significantly outperforms existing ones when the inputs are noisy or incomplete.
arXiv Detail & Related papers (2023-08-10T17:59:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.