RewardSDS: Aligning Score Distillation via Reward-Weighted Sampling
- URL: http://arxiv.org/abs/2503.09601v2
- Date: Thu, 13 Mar 2025 13:28:22 GMT
- Title: RewardSDS: Aligning Score Distillation via Reward-Weighted Sampling
- Authors: Itay Chachy, Guy Yariv, Sagie Benaim,
- Abstract summary: RewardSDS weights noise samples based on alignment scores from a reward model, producing a weighted SDS loss.<n>This loss prioritizes gradients from noise samples that yield aligned high-reward output.<n>We evaluate RewardSDS and RewardVSD on text-to-image, 2D editing, and text-to-3D generation tasks.
- Score: 14.725841457150414
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Score Distillation Sampling (SDS) has emerged as an effective technique for leveraging 2D diffusion priors for tasks such as text-to-3D generation. While powerful, SDS struggles with achieving fine-grained alignment to user intent. To overcome this, we introduce RewardSDS, a novel approach that weights noise samples based on alignment scores from a reward model, producing a weighted SDS loss. This loss prioritizes gradients from noise samples that yield aligned high-reward output. Our approach is broadly applicable and can extend SDS-based methods. In particular, we demonstrate its applicability to Variational Score Distillation (VSD) by introducing RewardVSD. We evaluate RewardSDS and RewardVSD on text-to-image, 2D editing, and text-to-3D generation tasks, showing significant improvements over SDS and VSD on a diverse set of metrics measuring generation quality and alignment to desired reward models, enabling state-of-the-art performance. Project page is available at https://itaychachy.github.io/reward-sds/.
Related papers
- TV-3DG: Mastering Text-to-3D Customized Generation with Visual Prompt [41.880416357543616]
We propose a novel algorithm, Score Matching (CSM), which removes the difference term in Score Distillation Sampling (SDS)
We integrate visual prompt information with an attention fusion mechanism and sampling guidance techniques, forming the Visual Prompt CSM algorithm.
We present our approach as TV-3DG, with extensive experiments demonstrating its capability to achieve stable, high-quality, customized 3D generation.
arXiv Detail & Related papers (2024-10-16T07:13:09Z) - VividDreamer: Invariant Score Distillation For Hyper-Realistic Text-to-3D Generation [33.05759961083337]
This paper presents Invariant Score Distillation (ISD), a novel method for high-fidelity text-to-3D generation.
ISD aims to tackle the over-saturation and over-smoothing problems in Score Distillation Sampling (SDS)
arXiv Detail & Related papers (2024-07-13T09:33:16Z) - ExactDreamer: High-Fidelity Text-to-3D Content Creation via Exact Score Matching [10.362259643427526]
Current approaches often adapt pre-trained 2D diffusion models for 3D synthesis.
Over-smoothing poses a significant limitation on the high-fidelity generation of 3D models.
LucidDreamer replaces the Denoising Diffusion Probabilistic Model (DDPM) in SDS with the Denoising Diffusion Implicit Model (DDIM)
arXiv Detail & Related papers (2024-05-24T20:19:45Z) - Score Distillation via Reparametrized DDIM [14.754513907729878]
We show that the image guidance used in Score Distillation Sampling can be understood as the velocity field of a 2D denoising generative process.
We show that a better noise approximation can be recovered by inverting DDIM in each SDS update step.
Our method achieves better or similar 3D generation quality compared to other state-of-the-art Score Distillation methods.
arXiv Detail & Related papers (2024-05-24T19:22:09Z) - Flow Score Distillation for Diverse Text-to-3D Generation [23.38418695449777]
Flow Score Distillation (FSD) substantially enhances generation diversity without compromising quality.
Our validation experiments across various text-to-image Diffusion Models demonstrate that FSD substantially enhances generation diversity without compromising quality.
arXiv Detail & Related papers (2024-05-16T06:05:16Z) - Consistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior [87.55592645191122]
Score distillation sampling (SDS) and its variants have greatly boosted the development of text-to-3D generation, but are vulnerable to geometry collapse and poor textures yet.
We propose a novel and effective "Consistent3D" method that explores the ODE deterministic sampling prior for text-to-3D generation.
Experimental results show the efficacy of our Consistent3D in generating high-fidelity and diverse 3D objects and large-scale scenes.
arXiv Detail & Related papers (2024-01-17T08:32:07Z) - NeuSD: Surface Completion with Multi-View Text-to-Image Diffusion [56.98287481620215]
We present a novel method for 3D surface reconstruction from multiple images where only a part of the object of interest is captured.
Our approach builds on two recent developments: surface reconstruction using neural radiance fields for the reconstruction of the visible parts of the surface, and guidance of pre-trained 2D diffusion models in the form of Score Distillation Sampling (SDS) to complete the shape in unobserved regions in a plausible manner.
arXiv Detail & Related papers (2023-12-07T19:30:55Z) - StableDreamer: Taming Noisy Score Distillation Sampling for Text-to-3D [88.66678730537777]
We present StableDreamer, a methodology incorporating three advances.
First, we formalize the equivalence of the SDS generative prior and a simple supervised L2 reconstruction loss.
Second, our analysis shows that while image-space diffusion contributes to geometric precision, latent-space diffusion is crucial for vivid color rendition.
arXiv Detail & Related papers (2023-12-02T02:27:58Z) - Noise-Free Score Distillation [78.79226724549456]
Noise-Free Score Distillation (NFSD) process requires minimal modifications to the original SDS framework.
We achieve more effective distillation of pre-trained text-to-image diffusion models while using a nominal CFG scale.
arXiv Detail & Related papers (2023-10-26T17:12:26Z) - Delta Denoising Score [51.98288453616375]
We introduce Delta Denoising Score (DDS), a novel scoring function for text-based image editing.
It guides minimal modifications of an input image towards the content described in a target prompt.
arXiv Detail & Related papers (2023-04-14T12:22:41Z) - Auto-Weighted Layer Representation Based View Synthesis Distortion
Estimation for 3-D Video Coding [78.53837757673597]
In this paper, an auto-weighted layer representation based view synthesis distortion estimation model is developed.
The proposed method outperforms the relevant state-of-the-art methods in both accuracy and efficiency.
arXiv Detail & Related papers (2022-01-07T12:12:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.