RecDreamer: Consistent Text-to-3D Generation via Uniform Score Distillation
- URL: http://arxiv.org/abs/2502.12640v1
- Date: Tue, 18 Feb 2025 08:42:55 GMT
- Title: RecDreamer: Consistent Text-to-3D Generation via Uniform Score Distillation
- Authors: Chenxi Zheng, Yihong Lin, Bangzhen Liu, Xuemiao Xu, Yongwei Nie, Shengfeng He,
- Abstract summary: Current text-to-3D generation methods often suffer from geometric inconsistencies, leading to repeated patterns across different poses of 3D assets.
We propose a solution called RecDreamer, which reshapes the underlying data distribution to achieve a more consistent pose representation.
Our experimental results demonstrate that RecDreamer effectively mitigates the Multi-Face Janus problem, leading to more consistent 3D asset generation across different poses.
- Score: 35.85874404892182
- License:
- Abstract: Current text-to-3D generation methods based on score distillation often suffer from geometric inconsistencies, leading to repeated patterns across different poses of 3D assets. This issue, known as the Multi-Face Janus problem, arises because existing methods struggle to maintain consistency across varying poses and are biased toward a canonical pose. While recent work has improved pose control and approximation, these efforts are still limited by this inherent bias, which skews the guidance during generation. To address this, we propose a solution called RecDreamer, which reshapes the underlying data distribution to achieve a more consistent pose representation. The core idea behind our method is to rectify the prior distribution, ensuring that pose variation is uniformly distributed rather than biased toward a canonical form. By modifying the prescribed distribution through an auxiliary function, we can reconstruct the density of the distribution to ensure compliance with specific marginal constraints. In particular, we ensure that the marginal distribution of poses follows a uniform distribution, thereby eliminating the biases introduced by the prior knowledge. We incorporate this rectified data distribution into existing score distillation algorithms, a process we refer to as uniform score distillation. To efficiently compute the posterior distribution required for the auxiliary function, RecDreamer introduces a training-free classifier that estimates pose categories in a plug-and-play manner. Additionally, we utilize various approximation techniques for noisy states, significantly improving system performance. Our experimental results demonstrate that RecDreamer effectively mitigates the Multi-Face Janus problem, leading to more consistent 3D asset generation across different poses.
Related papers
- ConsistentDreamer: View-Consistent Meshes Through Balanced Multi-View Gaussian Optimization [5.55656676725821]
We present ConsistentDreamer, where we first generate a set of fixed multi-view prior images and sample random views between them.
Thereby, we limit the discrepancies between the views guided by the SDS loss and ensure a consistent rough shape.
In each iteration, we also use our generated multi-view prior images for fine-detail reconstruction.
arXiv Detail & Related papers (2025-02-13T12:49:25Z) - ADen: Adaptive Density Representations for Sparse-view Camera Pose Estimation [17.097170273209333]
Recovering camera poses from a set of images is a foundational task in 3D computer vision.
Recent data-driven approaches aim to directly output camera poses, either through regressing the 6DoF camera poses or formulating rotation as a probability distribution.
We propose ADen to unify the two frameworks by employing a generator and a discriminator.
arXiv Detail & Related papers (2024-08-16T22:45:46Z) - Fine-grained Image-to-LiDAR Contrastive Distillation with Visual Foundation Models [55.99654128127689]
Visual Foundation Models (VFMs) are used to generate semantic labels for weakly-supervised pixel-to-point contrastive distillation.
We adapt sampling probabilities of points to address imbalances in spatial distribution and category frequency.
Our approach consistently surpasses existing image-to-LiDAR contrastive distillation methods in downstream tasks.
arXiv Detail & Related papers (2024-05-23T07:48:19Z) - A Variational Perspective on Solving Inverse Problems with Diffusion
Models [101.831766524264]
Inverse tasks can be formulated as inferring a posterior distribution over data.
This is however challenging in diffusion models since the nonlinear and iterative nature of the diffusion process renders the posterior intractable.
We propose a variational approach that by design seeks to approximate the true posterior distribution.
arXiv Detail & Related papers (2023-05-07T23:00:47Z) - Stable Target Field for Reduced Variance Score Estimation in Diffusion
Models [5.9115407007859755]
Diffusion models generate samples by reversing a fixed forward diffusion process.
We argue that the source of such variance lies in the handling of intermediate noise-variance scales.
We propose to remedy the problem by incorporating a reference batch which we use to calculate weighted conditional scores as more stable training targets.
arXiv Detail & Related papers (2023-02-01T18:57:01Z) - Feature-Distribution Perturbation and Calibration for Generalized Person
ReID [47.84576229286398]
Person Re-identification (ReID) has been advanced remarkably over the last 10 years along with the rapid development of deep learning for visual recognition.
We propose a Feature-Distribution Perturbation and generalization (PECA) method to derive generic feature representations for person ReID.
arXiv Detail & Related papers (2022-05-23T11:06:12Z) - Non-Local Latent Relation Distillation for Self-Adaptive 3D Human Pose
Estimation [63.199549837604444]
3D human pose estimation approaches leverage different forms of strong (2D/3D pose) or weak (multi-view or depth) paired supervision.
We cast 3D pose learning as a self-supervised adaptation problem that aims to transfer the task knowledge from a labeled source domain to a completely unpaired target.
We evaluate different self-adaptation settings and demonstrate state-of-the-art 3D human pose estimation performance on standard benchmarks.
arXiv Detail & Related papers (2022-04-05T03:52:57Z) - Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose
Estimation [70.32536356351706]
We introduce MRP-Net that constitutes a common deep network backbone with two output heads subscribing to two diverse configurations.
We derive suitable measures to quantify prediction uncertainty at both pose and joint level.
We present a comprehensive evaluation of the proposed approach and demonstrate state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2022-03-29T07:14:58Z) - Self-Conditioned Generative Adversarial Networks for Image Editing [61.50205580051405]
Generative Adversarial Networks (GANs) are susceptible to bias, learned from either the unbalanced data, or through mode collapse.
We argue that this bias is responsible not only for fairness concerns, but that it plays a key role in the collapse of latent-traversal editing methods when deviating away from the distribution's core.
arXiv Detail & Related papers (2022-02-08T18:08:24Z) - Orthogonal Jacobian Regularization for Unsupervised Disentanglement in
Image Generation [64.92152574895111]
We propose a simple Orthogonal Jacobian Regularization (OroJaR) to encourage deep generative model to learn disentangled representations.
Our method is effective in disentangled and controllable image generation, and performs favorably against the state-of-the-art methods.
arXiv Detail & Related papers (2021-08-17T15:01:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.