Related papers: Consistency^2: Consistent and Fast 3D Painting with Latent Consistency Models

Consistency^2: Consistent and Fast 3D Painting with Latent Consistency Models

URL: http://arxiv.org/abs/2406.11202v1
Date: Mon, 17 Jun 2024 04:40:07 GMT
Title: Consistency^2: Consistent and Fast 3D Painting with Latent Consistency Models
Authors: Tianfu Wang, Anton Obukhov, Konrad Schindler,
Abstract summary: Generative 3D Painting is among the top productivity boosters in high-resolution 3D asset management and recycling. We propose a Latent Consistency Model (LCM) adaptation for the task at hand. We analyze the strengths and weaknesses of the proposed model and evaluate it quantitatively and qualitatively.
Score: 29.818123424954294
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Generative 3D Painting is among the top productivity boosters in high-resolution 3D asset management and recycling. Ever since text-to-image models became accessible for inference on consumer hardware, the performance of 3D Painting methods has consistently improved and is currently close to plateauing. At the core of most such models lies denoising diffusion in the latent space, an inherently time-consuming iterative process. Multiple techniques have been developed recently to accelerate generation and reduce sampling iterations by orders of magnitude. Designed for 2D generative imaging, these techniques do not come with recipes for lifting them into 3D. In this paper, we address this shortcoming by proposing a Latent Consistency Model (LCM) adaptation for the task at hand. We analyze the strengths and weaknesses of the proposed model and evaluate it quantitatively and qualitatively. Based on the Objaverse dataset samples study, our 3D painting method attains strong preference in all evaluations. Source code is available at https://github.com/kongdai123/consistency2.

Related papers

A Recipe for Generating 3D Worlds From a Single Image [28.396381735501524]
We introduce a recipe for generating immersive 3D worlds from a single image. This approach requires minimal training and uses existing generative models. Tested on both synthetic and real images, our method produces high-quality 3D environments suitable for VR display.
arXiv Detail & Related papers (2025-03-20T18:06:12Z)
DSplats: 3D Generation by Denoising Splats-Based Multiview Diffusion Models [67.50989119438508]
We introduce DSplats, a novel method that directly denoises multiview images using Gaussian-based Reconstructors to produce realistic 3D assets. Our experiments demonstrate that DSplats not only produces high-quality, spatially consistent outputs, but also sets a new standard in single-image to 3D reconstruction.
arXiv Detail & Related papers (2024-12-11T07:32:17Z)
VividDreamer: Towards High-Fidelity and Efficient Text-to-3D Generation [69.68568248073747]
We propose Pose-dependent Consistency Distillation Sampling (PCDS), a novel yet efficient objective for diffusion-based 3D generation tasks. PCDS builds the pose-dependent consistency function within diffusion trajectories, allowing to approximate true gradients through minimal sampling steps. For efficient generation, we propose a coarse-to-fine optimization strategy, which first utilizes 1-step PCDS to create the basic structure of 3D objects, and then gradually increases PCDS steps to generate fine-grained details.
arXiv Detail & Related papers (2024-06-21T08:21:52Z)
TexPainter: Generative Mesh Texturing with Multi-view Consistency [20.366302413005734]
In this paper, we propose a novel method to enforce multi-view consistency. We use an optimization-based color-fusion to enforce consistency and indirectly modify the latent codes by gradient back-propagation. Our method improves consistency and overall quality of the generated textures as compared to competing state-of-the-arts.
arXiv Detail & Related papers (2024-05-17T18:41:36Z)
UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues [55.69339788566899]
UPose3D is a novel approach for multi-view 3D human pose estimation. It improves robustness and flexibility without requiring direct 3D annotations.
arXiv Detail & Related papers (2024-04-23T00:18:00Z)
ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models [65.22994156658918]
We present a method that learns to generate multi-view images in a single denoising process from real-world data. We design an autoregressive generation that renders more 3D-consistent images at any viewpoint.
arXiv Detail & Related papers (2024-03-04T07:57:05Z)
LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching [33.696757740830506]
Recent advancements in text-to-3D generation have shown promise. Many methods base themselves on Score Distillation Sampling (SDS) We propose Interval Score Matching (ISM) to counteract over-smoothing.
arXiv Detail & Related papers (2023-11-19T09:59:09Z)
Sparse3D: Distilling Multiview-Consistent Diffusion for Object Reconstruction from Sparse Views [47.215089338101066]
We present Sparse3D, a novel 3D reconstruction method tailored for sparse view inputs. Our approach distills robust priors from a multiview-consistent diffusion model to refine a neural radiance field. By tapping into 2D priors from powerful image diffusion models, our integrated model consistently delivers high-quality results.
arXiv Detail & Related papers (2023-08-27T11:52:00Z)
IT3D: Improved Text-to-3D Generation with Explicit View Synthesis [71.68595192524843]
This study presents a novel strategy that leverages explicitly synthesized multi-view images to address these issues. Our approach involves the utilization of image-to-image pipelines, empowered by LDMs, to generate posed high-quality images. For the incorporated discriminator, the synthesized multi-view images are considered real data, while the renderings of the optimized 3D models function as fake data.
arXiv Detail & Related papers (2023-08-22T14:39:17Z)
ARTIC3D: Learning Robust Articulated 3D Shapes from Noisy Web Image Collections [71.46546520120162]
Estimating 3D articulated shapes like animal bodies from monocular images is inherently challenging. We propose ARTIC3D, a self-supervised framework to reconstruct per-instance 3D shapes from a sparse image collection in-the-wild. We produce realistic animations by fine-tuning the rendered shape and texture under rigid part transformations.
arXiv Detail & Related papers (2023-06-07T17:47:50Z)
Efficient Text-Guided 3D-Aware Portrait Generation with Score Distillation Sampling on Distribution [28.526714129927093]
We propose DreamPortrait, which aims to generate text-guided 3D-aware portraits in a single-forward pass for efficiency. We further design a 3D-aware gated cross-attention mechanism to explicitly let the model perceive the correspondence between the text and the 3D-aware space.
arXiv Detail & Related papers (2023-06-03T11:08:38Z)
TextMesh: Generation of Realistic 3D Meshes From Text Prompts [56.2832907275291]
We propose a novel method for generation of highly realistic-looking 3D meshes. To this end, we extend NeRF to employ an SDF backbone, leading to improved 3D mesh extraction.
arXiv Detail & Related papers (2023-04-24T20:29:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.