View Transformation Robustness for Multi-View 3D Object Reconstruction with Reconstruction Error-Guided View Selection
- URL: http://arxiv.org/abs/2412.11428v1
- Date: Mon, 16 Dec 2024 03:54:08 GMT
- Title: View Transformation Robustness for Multi-View 3D Object Reconstruction with Reconstruction Error-Guided View Selection
- Authors: Qi Zhang, Zhouhang Luo, Tao Yu, Hui Huang,
- Abstract summary: view transformation robustness (VTR) is critical for deep-learning-based multi-view 3D object reconstruction models.
We propose a reconstruction error-guided view selection method, which considers the reconstruction errors' spatial distribution of the 3D predictions.
The proposed method can outperform state-of-the-art 3D reconstruction methods and other view transformation robustness comparison methods.
- Score: 19.07686691657438
- License:
- Abstract: View transformation robustness (VTR) is critical for deep-learning-based multi-view 3D object reconstruction models, which indicates the methods' stability under inputs with various view transformations. However, existing research seldom focused on view transformation robustness in multi-view 3D object reconstruction. One direct way to improve the models' VTR is to produce data with more view transformations and add them to model training. Recent progress on large vision models, particularly Stable Diffusion models, has provided great potential for generating 3D models or synthesizing novel view images with only a single image input. Directly deploying these models at inference consumes heavy computation resources and their robustness to view transformations is not guaranteed either. To fully utilize the power of Stable Diffusion models without extra inference computation burdens, we propose to generate novel views with Stable Diffusion models for better view transformation robustness. Instead of synthesizing random views, we propose a reconstruction error-guided view selection method, which considers the reconstruction errors' spatial distribution of the 3D predictions and chooses the views that could cover the reconstruction errors as much as possible. The methods are trained and tested on sets with large view transformations to validate the 3D reconstruction models' robustness to view transformations. Extensive experiments demonstrate that the proposed method can outperform state-of-the-art 3D reconstruction methods and other view transformation robustness comparison methods.
Related papers
- UVRM: A Scalable 3D Reconstruction Model from Unposed Videos [69.89526627921612]
Training 3D reconstruction models with 2D visual data traditionally requires prior knowledge of camera poses for the training samples.
We introduce UVRM, a novel 3D reconstruction model capable of being trained and evaluated on monocular videos without requiring any information about the pose.
arXiv Detail & Related papers (2025-01-16T08:00:17Z) - MVBoost: Boost 3D Reconstruction with Multi-View Refinement [41.46372172076206]
The scarcity of diverse 3D datasets results in limited generalization capabilities of 3D reconstruction models.
We propose a novel framework for boosting 3D reconstruction with multi-view refinement (MVBoost) by generating pseudo-GT data.
arXiv Detail & Related papers (2024-11-26T08:55:20Z) - Flex3D: Feed-Forward 3D Generation With Flexible Reconstruction Model And Input View Curation [61.040832373015014]
We propose Flex3D, a novel framework for generating high-quality 3D content from text, single images, or sparse view images.
We employ a fine-tuned multi-view image diffusion model and a video diffusion model to generate a pool of candidate views, enabling a rich representation of the target 3D object.
In the second stage, the curated views are fed into a Flexible Reconstruction Model (FlexRM), built upon a transformer architecture that can effectively process an arbitrary number of inputs.
arXiv Detail & Related papers (2024-10-01T17:29:43Z) - MVDiff: Scalable and Flexible Multi-View Diffusion for 3D Object Reconstruction from Single-View [0.0]
This paper proposes a general framework to generate consistent multi-view images from single image or leveraging scene representation transformer and view-conditioned diffusion model.
Our model is able to generate 3D meshes surpassing baselines methods in evaluation metrics, including PSNR, SSIM and LPIPS.
arXiv Detail & Related papers (2024-05-06T22:55:53Z) - MVD-Fusion: Single-view 3D via Depth-consistent Multi-view Generation [54.27399121779011]
We present MVD-Fusion: a method for single-view 3D inference via generative modeling of multi-view-consistent RGB-D images.
We show that our approach can yield more accurate synthesis compared to recent state-of-the-art, including distillation-based 3D inference and prior multi-view generation methods.
arXiv Detail & Related papers (2024-04-04T17:59:57Z) - Novel View Synthesis with Diffusion Models [56.55571338854636]
We present 3DiM, a diffusion model for 3D novel view synthesis.
It is able to translate a single input view into consistent and sharp completions across many views.
3DiM can generate multiple views that are 3D consistent using a novel technique called conditioning.
arXiv Detail & Related papers (2022-10-06T16:59:56Z) - 3D-C2FT: Coarse-to-fine Transformer for Multi-view 3D Reconstruction [14.89364490991374]
This paper proposes a new model, namely 3D coarse-to-fine transformer (3D-C2FT), for encoding multi-view features and rectifying defective 3D objects.
C2F attention mechanism enables the model to learn multi-view information flow and synthesize 3D surface correction in a coarse to fine-grained manner.
Experimental results show that 3D-C2FT achieves notable results and outperforms several competing models on these datasets.
arXiv Detail & Related papers (2022-05-29T06:01:42Z) - Pixel2Mesh++: 3D Mesh Generation and Refinement from Multi-View Images [82.32776379815712]
We study the problem of shape generation in 3D mesh representation from a small number of color images with or without camera poses.
We adopt to further improve the shape quality by leveraging cross-view information with a graph convolution network.
Our model is robust to the quality of the initial mesh and the error of camera pose, and can be combined with a differentiable function for test-time optimization.
arXiv Detail & Related papers (2022-04-21T03:42:31Z) - Multi-view 3D Reconstruction with Transformer [34.756336770583154]
We reformulate the multi-view 3D reconstruction as a sequence-to-sequence prediction problem.
We propose a new framework named 3D Volume Transformer (VolT) for such a task.
Our method achieves a new state-of-the-art accuracy in multi-view reconstruction with fewer parameters.
arXiv Detail & Related papers (2021-03-24T03:14:49Z) - Self-Supervised Multi-View Learning via Auto-Encoding 3D Transformations [61.870882736758624]
We propose a novel self-supervised paradigm to learn Multi-View Transformation Equivariant Representations (MV-TER)
Specifically, we perform a 3D transformation on a 3D object, and obtain multiple views before and after the transformation via projection.
Then, we self-train a representation to capture the intrinsic 3D object representation by decoding 3D transformation parameters from the fused feature representations of multiple views before and after the transformation.
arXiv Detail & Related papers (2021-03-01T06:24:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.