Related papers: 3DEnhancer: Consistent Multi-View Diffusion for 3D Enhancement

3DEnhancer: Consistent Multi-View Diffusion for 3D Enhancement

URL: http://arxiv.org/abs/2412.18565v2
Date: Tue, 29 Apr 2025 02:51:28 GMT
Title: 3DEnhancer: Consistent Multi-View Diffusion for 3D Enhancement
Authors: Yihang Luo, Shangchen Zhou, Yushi Lan, Xingang Pan, Chen Change Loy,
Abstract summary: We present 3DEnhancer, which employs a multi-view latent diffusion model to enhance coarse 3D inputs while preserving multi-view consistency.<n>Unlike existing video-based approaches, our model supports seamless multi-view enhancement with improved coherence across diverse viewing angles.
Score: 66.8116563135326
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite advances in neural rendering, due to the scarcity of high-quality 3D datasets and the inherent limitations of multi-view diffusion models, view synthesis and 3D model generation are restricted to low resolutions with suboptimal multi-view consistency. In this study, we present a novel 3D enhancement pipeline, dubbed 3DEnhancer, which employs a multi-view latent diffusion model to enhance coarse 3D inputs while preserving multi-view consistency. Our method includes a pose-aware encoder and a diffusion-based denoiser to refine low-quality multi-view images, along with data augmentation and a multi-view attention module with epipolar aggregation to maintain consistent, high-quality 3D outputs across views. Unlike existing video-based approaches, our model supports seamless multi-view enhancement with improved coherence across diverse viewing angles. Extensive evaluations show that 3DEnhancer significantly outperforms existing methods, boosting both multi-view enhancement and per-instance 3D optimization tasks.

Related papers

CDI3D: Cross-guided Dense-view Interpolation for 3D Reconstruction [25.468907201804093]
Large Reconstruction Models (LRMs) have shown great promise in leveraging multi-view images generated by 2D diffusion models to extract 3D content. However, 2D diffusion models often struggle to produce dense images with strong multi-view consistency. We present CDI3D, a feed-forward framework designed for efficient, high-quality image-to-3D generation with view.
arXiv Detail & Related papers (2025-03-11T03:08:43Z)
3D-MoE: A Mixture-of-Experts Multi-modal LLM for 3D Vision and Pose Diffusion via Rectified Flow [69.94527569577295]
3D vision and spatial reasoning have long been recognized as preferable for accurately perceiving our three-dimensional world. Due to the difficulties in collecting high-quality 3D data, research in this area has only recently gained momentum. We propose converting existing densely activated LLMs into mixture-of-experts (MoE) models, which have proven effective for multi-modal data processing.
arXiv Detail & Related papers (2025-01-28T04:31:19Z)
Flex3D: Feed-Forward 3D Generation With Flexible Reconstruction Model And Input View Curation [61.040832373015014]
We propose Flex3D, a novel framework for generating high-quality 3D content from text, single images, or sparse view images. We employ a fine-tuned multi-view image diffusion model and a video diffusion model to generate a pool of candidate views, enabling a rich representation of the target 3D object. In the second stage, the curated views are fed into a Flexible Reconstruction Model (FlexRM), built upon a transformer architecture that can effectively process an arbitrary number of inputs.
arXiv Detail & Related papers (2024-10-01T17:29:43Z)
PlacidDreamer: Advancing Harmony in Text-to-3D Generation [20.022078051436846]
PlacidDreamer is a text-to-3D framework that harmonizes multi-view generation and text-conditioned generation. It employs a novel score distillation algorithm to achieve balanced saturation.
arXiv Detail & Related papers (2024-07-19T02:00:04Z)
Vivid-ZOO: Multi-View Video Generation with Diffusion Model [76.96449336578286]
New challenges lie in the lack of massive captioned multi-view videos and the complexity of modeling such multi-dimensional distribution. We propose a novel diffusion-based pipeline that generates high-quality multi-view videos centered around a dynamic 3D object from text.
arXiv Detail & Related papers (2024-06-12T21:44:04Z)
Bootstrap3D: Improving Multi-view Diffusion Model with Synthetic Data [80.92268916571712]
A critical bottleneck is the scarcity of high-quality 3D objects with detailed captions. We propose Bootstrap3D, a novel framework that automatically generates an arbitrary quantity of multi-view images. We have generated 1 million high-quality synthetic multi-view images with dense descriptive captions.
arXiv Detail & Related papers (2024-05-31T17:59:56Z)
MVD-Fusion: Single-view 3D via Depth-consistent Multi-view Generation [54.27399121779011]
We present MVD-Fusion: a method for single-view 3D inference via generative modeling of multi-view-consistent RGB-D images. We show that our approach can yield more accurate synthesis compared to recent state-of-the-art, including distillation-based 3D inference and prior multi-view generation methods.
arXiv Detail & Related papers (2024-04-04T17:59:57Z)
Diffusion$^2$: Dynamic 3D Content Generation via Score Composition of Video and Multi-view Diffusion Models [6.738732514502613]
Diffusion$2$ is a novel framework for dynamic 3D content creation. It reconciles the knowledge about geometric consistency and temporal smoothness from 3D models to directly sample dense multi-view images. Experiments demonstrate the efficacy of our proposed framework in generating highly seamless and consistent 4D assets.
arXiv Detail & Related papers (2024-04-02T17:58:03Z)
Envision3D: One Image to 3D with Anchor Views Interpolation [18.31796952040799]
We present Envision3D, a novel method for efficiently generating high-quality 3D content from a single image. It is capable of generating high-quality 3D content in terms of texture and geometry, surpassing previous image-to-3D baseline methods.
arXiv Detail & Related papers (2024-03-13T18:46:33Z)
MVDream: Multi-view Diffusion for 3D Generation [14.106283556521962]
We introduce MVDream, a diffusion model that is able to generate consistent multi-view images from a given text prompt. Learning from both 2D and 3D data, a multi-view diffusion model can achieve the generalizability of 2D diffusion models and the consistency of 3D renderings.
arXiv Detail & Related papers (2023-08-31T07:49:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.