Related papers: Cyc3D: Fine-grained Controllable 3D Generation via Cycle Consistency Regularization

Cyc3D: Fine-grained Controllable 3D Generation via Cycle Consistency Regularization

URL: http://arxiv.org/abs/2504.14975v1
Date: Mon, 21 Apr 2025 09:05:52 GMT
Title: Cyc3D: Fine-grained Controllable 3D Generation via Cycle Consistency Regularization
Authors: Hongbin Xu, Chaohui Yu, Feng Xiao, Jiazheng Xing, Hai Ci, Weitao Chen, Ming Li,
Abstract summary: name enhances controllable 3D generation by encouraging cyclic consistency between generated 3D content and input controls.<n>emphView consistency ensures coherence between the two generated 3D objects.<n>emphCondition consistency aligns the final extracted signal with the original input control, preserving structural or geometric details.
Score: 16.157989435669656
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Despite the remarkable progress of 3D generation, achieving controllability, i.e., ensuring consistency between generated 3D content and input conditions like edge and depth, remains a significant challenge. Existing methods often struggle to maintain accurate alignment, leading to noticeable discrepancies. To address this issue, we propose \name{}, a new framework that enhances controllable 3D generation by explicitly encouraging cyclic consistency between the second-order 3D content, generated based on extracted signals from the first-order generation, and its original input controls. Specifically, we employ an efficient feed-forward backbone that can generate a 3D object from an input condition and a text prompt. Given an initial viewpoint and a control signal, a novel view is rendered from the generated 3D content, from which the extracted condition is used to regenerate the 3D content. This re-generated output is then rendered back to the initial viewpoint, followed by another round of control signal extraction, forming a cyclic process with two consistency constraints. \emph{View consistency} ensures coherence between the two generated 3D objects, measured by semantic similarity to accommodate generative diversity. \emph{Condition consistency} aligns the final extracted signal with the original input control, preserving structural or geometric details throughout the process. Extensive experiments on popular benchmarks demonstrate that \name{} significantly improves controllability, especially for fine-grained details, outperforming existing methods across various conditions (e.g., +14.17\% PSNR for edge, +6.26\% PSNR for sketch).

Related papers

DiGA3D: Coarse-to-Fine Diffusional Propagation of Geometry and Appearance for Versatile 3D Inpainting [10.515239541326737]
Single reference inpainting methods lack robustness when dealing with views far from the reference view.<n>Appearance inconsistency arises when independently inpainting multi-view images with 2D diffusion priors.<n>DiGA3D uses diffusion models to propagate consistent appearance and geometry in a coarse-to-fine manner.
arXiv Detail & Related papers (2025-07-01T04:57:08Z)
Step1X-3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets [90.99212668875971]
Step1X-3D is an open framework addressing challenges such as data scarcity, algorithmic limitations, and ecosystem fragmentation.<n>We present a two-stage 3D-native architecture combining a hybrid VAE-DiT geometry generator with a diffusion-based texture synthesis module.<n> Benchmark results demonstrate state-of-the-art performance that exceeds existing open-source methods.
arXiv Detail & Related papers (2025-05-12T16:56:30Z)
Dragen3D: Multiview Geometry Consistent 3D Gaussian Generation with Drag-Based Control [2.5031284037888395]
Dragen3D is a novel approach that achieves geometrically consistent and controllable 3D generation.<n>We introduce the Anchor-Gaussian Variational Autoencoder (Anchor-GS VAE), which encodes a point cloud and a single image into anchor latents and decode these latents into 3DGS.<n>To the best of our knowledge, we are the first to achieve geometrically controllable 3D Gaussian generation and editing.
arXiv Detail & Related papers (2025-02-23T07:19:03Z)
F3D-Gaus: Feed-forward 3D-aware Generation on ImageNet with Cycle-Aggregative Gaussian Splatting [35.625593119642424]
This paper tackles the problem of generalizable 3D-aware generation from monocular datasets. We propose a novel feed-forward pipeline based on pixel-aligned Gaussian Splatting. We also introduce a self-supervised cycle-aggregative constraint to enforce cross-view consistency in the learned 3D representation.
arXiv Detail & Related papers (2025-01-12T04:44:44Z)
T-3DGS: Removing Transient Objects for 3D Scene Reconstruction [83.05271859398779]
Transient objects in video sequences can significantly degrade the quality of 3D scene reconstructions.<n>We propose T-3DGS, a novel framework that robustly filters out transient distractors during 3D reconstruction using Gaussian Splatting.
arXiv Detail & Related papers (2024-11-29T07:45:24Z)
GaussianAnything: Interactive Point Cloud Flow Matching For 3D Object Generation [75.39457097832113]
This paper introduces a novel 3D generation framework, offering scalable, high-quality 3D generation with an interactive Point Cloud-structured Latent space.<n>Our framework employs a Variational Autoencoder with multi-view posed RGB-D(epth)-N(ormal) renderings as input, using a unique latent space design that preserves 3D shape information.<n>The proposed method, GaussianAnything, supports multi-modal conditional 3D generation, allowing for point cloud, caption, and single image inputs.
arXiv Detail & Related papers (2024-11-12T18:59:32Z)
ControLRM: Fast and Controllable 3D Generation via Large Reconstruction Model [36.34976357766257]
We introduce ControLRM, an end-to-end feed-forward model designed for rapid and controllable 3D generation. ControLRM comprises a 2D condition generator, a condition encoding transformer, and a triplane decoder transformer. In the condition training branch, we lock the triplane decoder and reuses the deep and robust encoding layers pretrained with millions of 3D data in LRM. In the image training branch, we unlock the triplane decoder to establish an implicit alignment between the 2D and 3D representations.
arXiv Detail & Related papers (2024-10-12T16:47:20Z)
Deep Geometric Moments Promote Shape Consistency in Text-to-3D Generation [27.43973967994717]
MT3D is a text-to-3D generative model that leverages a high-fidelity 3D object to overcome viewpoint bias.<n>By incorporating geometric details from a 3D asset, MT3D enables the creation of diverse and geometrically consistent objects.
arXiv Detail & Related papers (2024-08-12T06:25:44Z)
VividDreamer: Towards High-Fidelity and Efficient Text-to-3D Generation [69.68568248073747]
We propose Pose-dependent Consistency Distillation Sampling (PCDS), a novel yet efficient objective for diffusion-based 3D generation tasks. PCDS builds the pose-dependent consistency function within diffusion trajectories, allowing to approximate true gradients through minimal sampling steps. For efficient generation, we propose a coarse-to-fine optimization strategy, which first utilizes 1-step PCDS to create the basic structure of 3D objects, and then gradually increases PCDS steps to generate fine-grained details.
arXiv Detail & Related papers (2024-06-21T08:21:52Z)
NeuSDFusion: A Spatial-Aware Generative Model for 3D Shape Completion, Reconstruction, and Generation [52.772319840580074]
3D shape generation aims to produce innovative 3D content adhering to specific conditions and constraints. Existing methods often decompose 3D shapes into a sequence of localized components, treating each element in isolation. We introduce a novel spatial-aware 3D shape generation framework that leverages 2D plane representations for enhanced 3D shape modeling.
arXiv Detail & Related papers (2024-03-27T04:09:34Z)
Sculpt3D: Multi-View Consistent Text-to-3D Generation with Sparse 3D Prior [57.986512832738704]
We present a new framework Sculpt3D that equips the current pipeline with explicit injection of 3D priors from retrieved reference objects without re-training the 2D diffusion model. Specifically, we demonstrate that high-quality and diverse 3D geometry can be guaranteed by keypoints supervision through a sparse ray sampling approach. These two decoupled designs effectively harness 3D information from reference objects to generate 3D objects while preserving the generation quality of the 2D diffusion model.
arXiv Detail & Related papers (2024-03-14T07:39:59Z)
LucidDreaming: Controllable Object-Centric 3D Generation [10.646855651524387]
We present a pipeline capable of spatial and numerical control over 3D generation from only textual prompt commands or 3D bounding boxes. LucidDreaming achieves superior results in object placement precision and generation fidelity compared to current approaches.
arXiv Detail & Related papers (2023-11-30T18:55:23Z)
CGOF++: Controllable 3D Face Synthesis with Conditional Generative Occupancy Fields [52.14985242487535]
We propose a new conditional 3D face synthesis framework, which enables 3D controllability over generated face images. At its core is a conditional Generative Occupancy Field (cGOF++) that effectively enforces the shape of the generated face to conform to a given 3D Morphable Model (3DMM) mesh. Experiments validate the effectiveness of the proposed method and show more precise 3D controllability than state-of-the-art 2D-based controllable face synthesis methods.
arXiv Detail & Related papers (2022-11-23T19:02:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.