Rethinking Score Distillation as a Bridge Between Image Distributions
- URL: http://arxiv.org/abs/2406.09417v2
- Date: Tue, 10 Dec 2024 19:55:39 GMT
- Title: Rethinking Score Distillation as a Bridge Between Image Distributions
- Authors: David McAllister, Songwei Ge, Jia-Bin Huang, David W. Jacobs, Alexei A. Efros, Aleksander Holynski, Angjoo Kanazawa,
- Abstract summary: We show that our method seeks to transport corrupted images (source) to the natural image distribution (target)
Our method can be easily applied across many domains, matching or beating the performance of specialized methods.
We demonstrate its utility in text-to-2D, text-based NeRF optimization, translating paintings to real images, optical illusion generation, and 3D sketch-to-real.
- Score: 97.27476302077545
- License:
- Abstract: Score distillation sampling (SDS) has proven to be an important tool, enabling the use of large-scale diffusion priors for tasks operating in data-poor domains. Unfortunately, SDS has a number of characteristic artifacts that limit its usefulness in general-purpose applications. In this paper, we make progress toward understanding the behavior of SDS and its variants by viewing them as solving an optimal-cost transport path from a source distribution to a target distribution. Under this new interpretation, these methods seek to transport corrupted images (source) to the natural image distribution (target). We argue that current methods' characteristic artifacts are caused by (1) linear approximation of the optimal path and (2) poor estimates of the source distribution. We show that calibrating the text conditioning of the source distribution can produce high-quality generation and translation results with little extra overhead. Our method can be easily applied across many domains, matching or beating the performance of specialized methods. We demonstrate its utility in text-to-2D, text-based NeRF optimization, translating paintings to real images, optical illusion generation, and 3D sketch-to-real. We compare our method to existing approaches for score distillation sampling and show that it can produce high-frequency details with realistic colors.
Related papers
- A Diffusion Model Translator for Efficient Image-to-Image Translation [60.86381807306705]
We propose an efficient method that equips a diffusion model with a lightweight translator, dubbed a Diffusion Model Translator (DMT)
We evaluate our approach on a range of I2I applications, including image stylization, image colorization, segmentation to image, and sketch to image, to validate its efficacy and general utility.
arXiv Detail & Related papers (2025-02-01T04:01:24Z) - Generalizable Origin Identification for Text-Guided Image-to-Image Diffusion Models [39.234894330025114]
Text-guided image-to-image diffusion models excel in translating images based on textual prompts.
This motivates us to introduce the task of origin IDentification for text-guided Image-to-image Diffusion models (ID$2$)
A straightforward solution to ID$2$ involves training a specialized deep embedding model to extract and compare features from both query and reference images.
arXiv Detail & Related papers (2025-01-04T20:34:53Z) - Efficient Dataset Distillation via Diffusion-Driven Patch Selection for Improved Generalization [34.79567392368196]
We propose a novel framework to existing diffusion-based distillation methods, leveraging diffusion models for selection rather than generation.
Our method starts by predicting noise generated by the diffusion model based on input images and text prompts, then calculates the corresponding loss for each pair.
This streamlined framework enables a single-step distillation process, and extensive experiments demonstrate that our approach outperforms state-of-the-art methods across various metrics.
arXiv Detail & Related papers (2024-12-13T08:34:46Z) - DreamMapping: High-Fidelity Text-to-3D Generation via Variational Distribution Mapping [20.7584503748821]
Score Distillation Sampling (SDS) has emerged as a prevalent technique for text-to-3D generation, enabling 3D content creation by distilling view-dependent information from text-to-2D guidance.
We conduct a thorough analysis of SDS and refine its formulation, finding that the core design is to model the distribution of rendered images.
We introduce a novel strategy called Variational Distribution Mapping (VDM), which expedites the distribution modeling process by regarding the rendered images as instances of degradation from diffusion-based generation.
arXiv Detail & Related papers (2024-09-08T14:04:48Z) - Text-to-Image Diffusion Models are Great Sketch-Photo Matchmakers [120.49126407479717]
This paper explores text-to-image diffusion models for Zero-Shot Sketch-based Image Retrieval (ZS-SBIR)
We highlight a pivotal discovery: the capacity of text-to-image diffusion models to seamlessly bridge the gap between sketches and photos.
arXiv Detail & Related papers (2024-03-12T00:02:03Z) - Correcting Diffusion Generation through Resampling [32.93858075964824]
We propose a particle filtering framework that can reduce the distributional discrepancies between generated and ground-truth images.
Our method can effectively correct missing object errors and improve image quality in various image generation tasks.
arXiv Detail & Related papers (2023-12-10T23:35:13Z) - Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing [58.48890547818074]
We present a powerful modification of Contrastive Denoising Score (CUT) for latent diffusion models (LDM)
Our approach enables zero-shot imageto-image translation and neural field (NeRF) editing, achieving structural correspondence between the input and output.
arXiv Detail & Related papers (2023-11-30T15:06:10Z) - Denoising Diffusion Bridge Models [54.87947768074036]
Diffusion models are powerful generative models that map noise to data using processes.
For many applications such as image editing, the model input comes from a distribution that is not random noise.
In our work, we propose Denoising Diffusion Bridge Models (DDBMs)
arXiv Detail & Related papers (2023-09-29T03:24:24Z) - Parents and Children: Distinguishing Multimodal DeepFakes from Natural Images [60.34381768479834]
Recent advancements in diffusion models have enabled the generation of realistic deepfakes from textual prompts in natural language.
We pioneer a systematic study on deepfake detection generated by state-of-the-art diffusion models.
arXiv Detail & Related papers (2023-04-02T10:25:09Z) - Pixel-based Facial Expression Synthesis [1.7056768055368383]
We propose a pixel-based facial expression synthesis method in which each output pixel observes only one input pixel.
The proposed model is two orders of magnitude smaller which makes it suitable for deployment on resource-constrained devices.
arXiv Detail & Related papers (2020-10-27T16:00:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.