From 2D Images to 3D Model:Weakly Supervised Multi-View Face
Reconstruction with Deep Fusion
- URL: http://arxiv.org/abs/2204.03842v4
- Date: Mon, 22 Jan 2024 06:30:15 GMT
- Title: From 2D Images to 3D Model:Weakly Supervised Multi-View Face
Reconstruction with Deep Fusion
- Authors: Weiguang Zhao and Chaolong Yang and Jianan Ye and Rui Zhang and Yuyao
Yan and Xi Yang and Bin Dong and Amir Hussain and Kaizhu Huang
- Abstract summary: We propose a novel model called Deep Fusion MVR to reconstruct high-precision 3D facial shapes from multi-view images.
Specifically, we introduce MulEn-Unet, a multi-view encoding to single decoding framework with skip connections and attention.
We develop the face parse network to learn, identify, and emphasize the critical common face area within multi-view images.
- Score: 26.011557635884568
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: While weakly supervised multi-view face reconstruction (MVR) is garnering
increased attention, one critical issue still remains open: how to effectively
fuse multiple image information to reconstruct high-precision 3D models. In
this regard, we propose a novel model called Deep Fusion MVR (DF-MVR) to
reconstruct high-precision 3D facial shapes from multi-view images.
Specifically, we introduce MulEn-Unet, a multi-view encoding to single decoding
framework with skip connections and attention. This design allows for the
extraction, integration, and compensation of deep features with attention from
multi-view images. Furthermore, we adopt the involution kernel to enrich deep
fusion features with channel features. In addition, we develop the face parse
network to learn, identify, and emphasize the critical common face area within
multi-view images. Experiments on Pixel-Face and Bosphorus datasets indicate
the superiority of our model. Without 3D annotation, DF-MVR achieves 5.2% and
3.0% RMSE improvement over the existing weakly supervised MVRs respectively on
Pixel-Face and Bosphorus dataset. Code will be available publicly at
https://github.com/weiguangzhao/DF_MVR.
Related papers
- Flex3D: Feed-Forward 3D Generation With Flexible Reconstruction Model And Input View Curation [61.040832373015014]
We propose Flex3D, a novel framework for generating high-quality 3D content from text, single images, or sparse view images.
We employ a fine-tuned multi-view image diffusion model and a video diffusion model to generate a pool of candidate views, enabling a rich representation of the target 3D object.
In the second stage, the curated views are fed into a Flexible Reconstruction Model (FlexRM), built upon a transformer architecture that can effectively process an arbitrary number of inputs.
arXiv Detail & Related papers (2024-10-01T17:29:43Z) - Pixel-Aligned Multi-View Generation with Depth Guided Decoder [86.1813201212539]
We propose a novel method for pixel-level image-to-multi-view generation.
Unlike prior work, we incorporate attention layers across multi-view images in the VAE decoder of a latent video diffusion model.
Our model enables better pixel alignment across multi-view images.
arXiv Detail & Related papers (2024-08-26T04:56:41Z) - MVGamba: Unify 3D Content Generation as State Space Sequence Modeling [150.80564081817786]
We introduce MVGamba, a general and lightweight Gaussian reconstruction model featuring a multi-view Gaussian reconstructor.
With off-the-detail multi-view diffusion models integrated, MVGamba unifies 3D generation tasks from a single image, sparse images, or text prompts.
Experiments demonstrate that MVGamba outperforms state-of-the-art baselines in all 3D content generation scenarios with approximately only $0.1times$ of the model size.
arXiv Detail & Related papers (2024-06-10T15:26:48Z) - Bootstrap3D: Improving Multi-view Diffusion Model with Synthetic Data [80.92268916571712]
A critical bottleneck is the scarcity of high-quality 3D objects with detailed captions.
We propose Bootstrap3D, a novel framework that automatically generates an arbitrary quantity of multi-view images.
We have generated 1 million high-quality synthetic multi-view images with dense descriptive captions.
arXiv Detail & Related papers (2024-05-31T17:59:56Z) - Envision3D: One Image to 3D with Anchor Views Interpolation [18.31796952040799]
We present Envision3D, a novel method for efficiently generating high-quality 3D content from a single image.
It is capable of generating high-quality 3D content in terms of texture and geometry, surpassing previous image-to-3D baseline methods.
arXiv Detail & Related papers (2024-03-13T18:46:33Z) - 2L3: Lifting Imperfect Generated 2D Images into Accurate 3D [16.66666619143761]
Multi-view (MV) 3D reconstruction is a promising solution to fuse generated MV images into consistent 3D objects.
However, the generated images usually suffer from inconsistent lighting, misaligned geometry, and sparse views, leading to poor reconstruction quality.
We present a novel 3D reconstruction framework that leverages intrinsic decomposition guidance, transient-mono prior guidance, and view augmentation to cope with the three issues.
arXiv Detail & Related papers (2024-01-29T02:30:31Z) - VPFusion: Joint 3D Volume and Pixel-Aligned Feature Fusion for Single
and Multi-view 3D Reconstruction [23.21446438011893]
VPFusionattains high-quality reconstruction using both - 3D feature volume to capture 3D-structure-aware context.
Existing approaches use RNN, feature pooling, or attention computed independently in each view for multi-view fusion.
We show improved multi-view feature fusion by establishing transformer-based pairwise view association.
arXiv Detail & Related papers (2022-03-14T23:30:58Z) - VoRTX: Volumetric 3D Reconstruction With Transformers for Voxelwise View
Selection and Fusion [68.68537312256144]
VoRTX is an end-to-end volumetric 3D reconstruction network using transformers for wide-baseline, multi-view feature fusion.
We train our model on ScanNet and show that it produces better reconstructions than state-of-the-art methods.
arXiv Detail & Related papers (2021-12-01T02:18:11Z) - Direct Multi-view Multi-person 3D Pose Estimation [138.48139701871213]
We present Multi-view Pose transformer (MvP) for estimating multi-person 3D poses from multi-view images.
MvP directly regresses the multi-person 3D poses in a clean and efficient way, without relying on intermediate tasks.
We show experimentally that our MvP model outperforms the state-of-the-art methods on several benchmarks while being much more efficient.
arXiv Detail & Related papers (2021-11-07T13:09:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.