Related papers: Rethinking Score Distillation as a Bridge Between Image Distributions

Rethinking Score Distillation as a Bridge Between Image Distributions

URL: http://arxiv.org/abs/2406.09417v2
Date: Tue, 10 Dec 2024 19:55:39 GMT
Title: Rethinking Score Distillation as a Bridge Between Image Distributions
Authors: David McAllister, Songwei Ge, Jia-Bin Huang, David W. Jacobs, Alexei A. Efros, Aleksander Holynski, Angjoo Kanazawa,
Abstract summary: We show that our method seeks to transport corrupted images (source) to the natural image distribution (target)<n>Our method can be easily applied across many domains, matching or beating the performance of specialized methods.<n>We demonstrate its utility in text-to-2D, text-based NeRF optimization, translating paintings to real images, optical illusion generation, and 3D sketch-to-real.
Score: 97.27476302077545
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Score distillation sampling (SDS) has proven to be an important tool, enabling the use of large-scale diffusion priors for tasks operating in data-poor domains. Unfortunately, SDS has a number of characteristic artifacts that limit its usefulness in general-purpose applications. In this paper, we make progress toward understanding the behavior of SDS and its variants by viewing them as solving an optimal-cost transport path from a source distribution to a target distribution. Under this new interpretation, these methods seek to transport corrupted images (source) to the natural image distribution (target). We argue that current methods' characteristic artifacts are caused by (1) linear approximation of the optimal path and (2) poor estimates of the source distribution. We show that calibrating the text conditioning of the source distribution can produce high-quality generation and translation results with little extra overhead. Our method can be easily applied across many domains, matching or beating the performance of specialized methods. We demonstrate its utility in text-to-2D, text-based NeRF optimization, translating paintings to real images, optical illusion generation, and 3D sketch-to-real. We compare our method to existing approaches for score distillation sampling and show that it can produce high-frequency details with realistic colors.

Related papers

Learning Latent Representations for Image Translation using Frequency Distributed CycleGAN [7.610968152027164]
Fd-CycleGAN is an image-to-image (I2I) translation framework that enhances latent representation learning to approximate real data distributions.<n>We conduct experiments on diverse datasets -- Horse2Zebra, Monet2Photo, and a synthetically augmented Strike-off dataset.<n>Our results suggest that frequency-guided latent learning significantly improves generalization in image translation tasks.
arXiv Detail & Related papers (2025-08-05T12:59:37Z)
A Diffusion Model Translator for Efficient Image-to-Image Translation [60.86381807306705]
We propose an efficient method that equips a diffusion model with a lightweight translator, dubbed a Diffusion Model Translator (DMT) We evaluate our approach on a range of I2I applications, including image stylization, image colorization, segmentation to image, and sketch to image, to validate its efficacy and general utility.
arXiv Detail & Related papers (2025-02-01T04:01:24Z)
Generalizable Origin Identification for Text-Guided Image-to-Image Diffusion Models [39.234894330025114]
Text-guided image-to-image diffusion models excel in translating images based on textual prompts. This motivates us to introduce the task of origin IDentification for text-guided Image-to-image Diffusion models (ID$2$) A straightforward solution to ID$2$ involves training a specialized deep embedding model to extract and compare features from both query and reference images.
arXiv Detail & Related papers (2025-01-04T20:34:53Z)
Efficient Dataset Distillation via Diffusion-Driven Patch Selection for Improved Generalization [34.53986517177061]
We propose a novel framework to existing diffusion-based distillation methods, leveraging diffusion models for selection rather than generation. Our method starts by predicting noise generated by the diffusion model based on input images and text prompts, then calculates the corresponding loss for each pair. This streamlined framework enables a single-step distillation process, and extensive experiments demonstrate that our approach outperforms state-of-the-art methods across various metrics.
arXiv Detail & Related papers (2024-12-13T08:34:46Z)
High-Precision Dichotomous Image Segmentation via Probing Diffusion Capacity [69.32473738284374]
Diffusion models have revolutionized text-to-image synthesis by delivering exceptional quality, fine detail resolution, and strong contextual awareness. We propose DiffDIS, a diffusion-driven segmentation model that taps into the potential of the pre-trained U-Net within diffusion models. Experiments on the DIS5K dataset demonstrate the superiority of DiffDIS, achieving state-of-the-art results through a streamlined inference process.
arXiv Detail & Related papers (2024-10-14T02:49:23Z)
DreamMapping: High-Fidelity Text-to-3D Generation via Variational Distribution Mapping [20.7584503748821]
Score Distillation Sampling (SDS) has emerged as a prevalent technique for text-to-3D generation, enabling 3D content creation by distilling view-dependent information from text-to-2D guidance. We conduct a thorough analysis of SDS and refine its formulation, finding that the core design is to model the distribution of rendered images. We introduce a novel strategy called Variational Distribution Mapping (VDM), which expedites the distribution modeling process by regarding the rendered images as instances of degradation from diffusion-based generation.
arXiv Detail & Related papers (2024-09-08T14:04:48Z)
Text-to-Image Diffusion Models are Great Sketch-Photo Matchmakers [120.49126407479717]
This paper explores text-to-image diffusion models for Zero-Shot Sketch-based Image Retrieval (ZS-SBIR) We highlight a pivotal discovery: the capacity of text-to-image diffusion models to seamlessly bridge the gap between sketches and photos.
arXiv Detail & Related papers (2024-03-12T00:02:03Z)
Correcting Diffusion Generation through Resampling [32.93858075964824]
We propose a particle filtering framework that can reduce the distributional discrepancies between generated and ground-truth images. Our method can effectively correct missing object errors and improve image quality in various image generation tasks.
arXiv Detail & Related papers (2023-12-10T23:35:13Z)
Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing [58.48890547818074]
We present a powerful modification of Contrastive Denoising Score (CUT) for latent diffusion models (LDM) Our approach enables zero-shot imageto-image translation and neural field (NeRF) editing, achieving structural correspondence between the input and output.
arXiv Detail & Related papers (2023-11-30T15:06:10Z)
Denoising Diffusion Bridge Models [54.87947768074036]
Diffusion models are powerful generative models that map noise to data using processes. For many applications such as image editing, the model input comes from a distribution that is not random noise. In our work, we propose Denoising Diffusion Bridge Models (DDBMs)
arXiv Detail & Related papers (2023-09-29T03:24:24Z)
Parents and Children: Distinguishing Multimodal DeepFakes from Natural Images [60.34381768479834]
Recent advancements in diffusion models have enabled the generation of realistic deepfakes from textual prompts in natural language. We pioneer a systematic study on deepfake detection generated by state-of-the-art diffusion models.
arXiv Detail & Related papers (2023-04-02T10:25:09Z)
Uncertainty Inspired Underwater Image Enhancement [45.05141499761876]
We present a novel probabilistic network to learn the enhancement distribution of degraded underwater images. By learning the enhancement distribution, our method can cope with the bias introduced in the reference map labeling. Experimental results demonstrate that our approach enables sampling possible enhancement predictions.
arXiv Detail & Related papers (2022-07-20T06:42:28Z)
Detecting Deepfakes with Self-Blended Images [37.374772758057844]
We present novel synthetic training data called self-blended images ( SBIs) to detect deepfakes. SBIs are generated by blending pseudo source and target images from single pristine images. We compare our approach with state-of-the-art methods on FF++, CDF, DFD, DFDC, DFDCP, and FFIW datasets.
arXiv Detail & Related papers (2022-04-18T15:44:35Z)
Dual Diffusion Implicit Bridges for Image-to-Image Translation [104.59371476415566]
Common image-to-image translation methods rely on joint training over data from both source and target domains. We present Dual Diffusion Implicit Bridges (DDIBs), an image translation method based on diffusion models. DDIBs allow translations between arbitrary pairs of source-target domains, given independently trained diffusion models on respective domains.
arXiv Detail & Related papers (2022-03-16T04:10:45Z)
Pixel-based Facial Expression Synthesis [1.7056768055368383]
We propose a pixel-based facial expression synthesis method in which each output pixel observes only one input pixel. The proposed model is two orders of magnitude smaller which makes it suitable for deployment on resource-constrained devices.
arXiv Detail & Related papers (2020-10-27T16:00:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.