Rethinking Score Distillation as a Bridge Between Image Distributions
- URL: http://arxiv.org/abs/2406.09417v1
- Date: Thu, 13 Jun 2024 17:59:58 GMT
- Title: Rethinking Score Distillation as a Bridge Between Image Distributions
- Authors: David McAllister, Songwei Ge, Jia-Bin Huang, David W. Jacobs, Alexei A. Efros, Aleksander Holynski, Angjoo Kanazawa,
- Abstract summary: We show that our method seeks to transport corrupted images (source) to the natural image distribution (target)
Our method can be easily applied across many domains, matching or beating the performance of specialized methods.
We demonstrate its utility in text-to-2D, text-based NeRF optimization, translating paintings to real images, optical illusion generation, and 3D sketch-to-real.
- Score: 97.27476302077545
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Score distillation sampling (SDS) has proven to be an important tool, enabling the use of large-scale diffusion priors for tasks operating in data-poor domains. Unfortunately, SDS has a number of characteristic artifacts that limit its usefulness in general-purpose applications. In this paper, we make progress toward understanding the behavior of SDS and its variants by viewing them as solving an optimal-cost transport path from a source distribution to a target distribution. Under this new interpretation, these methods seek to transport corrupted images (source) to the natural image distribution (target). We argue that current methods' characteristic artifacts are caused by (1) linear approximation of the optimal path and (2) poor estimates of the source distribution. We show that calibrating the text conditioning of the source distribution can produce high-quality generation and translation results with little extra overhead. Our method can be easily applied across many domains, matching or beating the performance of specialized methods. We demonstrate its utility in text-to-2D, text-based NeRF optimization, translating paintings to real images, optical illusion generation, and 3D sketch-to-real. We compare our method to existing approaches for score distillation sampling and show that it can produce high-frequency details with realistic colors.
Related papers
- DreamMapping: High-Fidelity Text-to-3D Generation via Variational Distribution Mapping [20.7584503748821]
Score Distillation Sampling (SDS) has emerged as a prevalent technique for text-to-3D generation, enabling 3D content creation by distilling view-dependent information from text-to-2D guidance.
We conduct a thorough analysis of SDS and refine its formulation, finding that the core design is to model the distribution of rendered images.
We introduce a novel strategy called Variational Distribution Mapping (VDM), which expedites the distribution modeling process by regarding the rendered images as instances of degradation from diffusion-based generation.
arXiv Detail & Related papers (2024-09-08T14:04:48Z) - Text-to-Image Diffusion Models are Great Sketch-Photo Matchmakers [120.49126407479717]
This paper explores text-to-image diffusion models for Zero-Shot Sketch-based Image Retrieval (ZS-SBIR)
We highlight a pivotal discovery: the capacity of text-to-image diffusion models to seamlessly bridge the gap between sketches and photos.
arXiv Detail & Related papers (2024-03-12T00:02:03Z) - Correcting Diffusion Generation through Resampling [32.93858075964824]
We propose a particle filtering framework that can reduce the distributional discrepancies between generated and ground-truth images.
Our method can effectively correct missing object errors and improve image quality in various image generation tasks.
arXiv Detail & Related papers (2023-12-10T23:35:13Z) - Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing [58.48890547818074]
We present a powerful modification of Contrastive Denoising Score (CUT) for latent diffusion models (LDM)
Our approach enables zero-shot imageto-image translation and neural field (NeRF) editing, achieving structural correspondence between the input and output.
arXiv Detail & Related papers (2023-11-30T15:06:10Z) - Denoising Diffusion Bridge Models [54.87947768074036]
Diffusion models are powerful generative models that map noise to data using processes.
For many applications such as image editing, the model input comes from a distribution that is not random noise.
In our work, we propose Denoising Diffusion Bridge Models (DDBMs)
arXiv Detail & Related papers (2023-09-29T03:24:24Z) - Parents and Children: Distinguishing Multimodal DeepFakes from Natural Images [60.34381768479834]
Recent advancements in diffusion models have enabled the generation of realistic deepfakes from textual prompts in natural language.
We pioneer a systematic study on deepfake detection generated by state-of-the-art diffusion models.
arXiv Detail & Related papers (2023-04-02T10:25:09Z) - Uncertainty Inspired Underwater Image Enhancement [45.05141499761876]
We present a novel probabilistic network to learn the enhancement distribution of degraded underwater images.
By learning the enhancement distribution, our method can cope with the bias introduced in the reference map labeling.
Experimental results demonstrate that our approach enables sampling possible enhancement predictions.
arXiv Detail & Related papers (2022-07-20T06:42:28Z) - Detecting Deepfakes with Self-Blended Images [37.374772758057844]
We present novel synthetic training data called self-blended images ( SBIs) to detect deepfakes.
SBIs are generated by blending pseudo source and target images from single pristine images.
We compare our approach with state-of-the-art methods on FF++, CDF, DFD, DFDC, DFDCP, and FFIW datasets.
arXiv Detail & Related papers (2022-04-18T15:44:35Z) - Dual Diffusion Implicit Bridges for Image-to-Image Translation [104.59371476415566]
Common image-to-image translation methods rely on joint training over data from both source and target domains.
We present Dual Diffusion Implicit Bridges (DDIBs), an image translation method based on diffusion models.
DDIBs allow translations between arbitrary pairs of source-target domains, given independently trained diffusion models on respective domains.
arXiv Detail & Related papers (2022-03-16T04:10:45Z) - Pixel-based Facial Expression Synthesis [1.7056768055368383]
We propose a pixel-based facial expression synthesis method in which each output pixel observes only one input pixel.
The proposed model is two orders of magnitude smaller which makes it suitable for deployment on resource-constrained devices.
arXiv Detail & Related papers (2020-10-27T16:00:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.