NoiseCLR: A Contrastive Learning Approach for Unsupervised Discovery of
Interpretable Directions in Diffusion Models
- URL: http://arxiv.org/abs/2312.05390v1
- Date: Fri, 8 Dec 2023 22:04:53 GMT
- Title: NoiseCLR: A Contrastive Learning Approach for Unsupervised Discovery of
Interpretable Directions in Diffusion Models
- Authors: Yusuf Dalva and Pinar Yanardag
- Abstract summary: We propose an unsupervised method to discover latent semantics in text-to-image diffusion models without relying on text prompts.
Our method achieves highly disentangled edits, outperforming existing approaches in both diffusion-based and GAN-based latent space editing methods.
- Score: 6.254873489691852
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generative models have been very popular in the recent years for their image
generation capabilities. GAN-based models are highly regarded for their
disentangled latent space, which is a key feature contributing to their success
in controlled image editing. On the other hand, diffusion models have emerged
as powerful tools for generating high-quality images. However, the latent space
of diffusion models is not as thoroughly explored or understood. Existing
methods that aim to explore the latent space of diffusion models usually relies
on text prompts to pinpoint specific semantics. However, this approach may be
restrictive in areas such as art, fashion, or specialized fields like medicine,
where suitable text prompts might not be available or easy to conceive thus
limiting the scope of existing work. In this paper, we propose an unsupervised
method to discover latent semantics in text-to-image diffusion models without
relying on text prompts. Our method takes a small set of unlabeled images from
specific domains, such as faces or cats, and a pre-trained diffusion model, and
discovers diverse semantics in unsupervised fashion using a contrastive
learning objective. Moreover, the learned directions can be applied
simultaneously, either within the same domain (such as various types of facial
edits) or across different domains (such as applying cat and face edits within
the same image) without interfering with each other. Our extensive experiments
show that our method achieves highly disentangled edits, outperforming existing
approaches in both diffusion-based and GAN-based latent space editing methods.
Related papers
- Unsupervised Region-Based Image Editing of Denoising Diffusion Models [50.005612464340246]
We propose a method to identify semantic attributes in the latent space of pre-trained diffusion models without any further training.
Our approach facilitates precise semantic discovery and control over local masked areas, eliminating the need for annotations.
arXiv Detail & Related papers (2024-12-17T13:46:12Z) - EditScout: Locating Forged Regions from Diffusion-based Edited Images with Multimodal LLM [50.054404519821745]
We present a novel framework that integrates a multimodal Large Language Model for enhanced reasoning capabilities.
Our framework achieves promising results on MagicBrush, AutoSplice, and PerfBrush datasets.
Notably, our method excels on the PerfBrush dataset, a self-constructed test set featuring previously unseen types of edits.
arXiv Detail & Related papers (2024-12-05T02:05:33Z) - Decoding Diffusion: A Scalable Framework for Unsupervised Analysis of Latent Space Biases and Representations Using Natural Language Prompts [68.48103545146127]
This paper proposes a novel framework for unsupervised exploration of diffusion latent spaces.
We directly leverage natural language prompts and image captions to map latent directions.
Our method provides a more scalable and interpretable understanding of the semantic knowledge encoded within diffusion models.
arXiv Detail & Related papers (2024-10-25T21:44:51Z) - Merging and Splitting Diffusion Paths for Semantically Coherent Panoramas [33.334956022229846]
We propose the Merge-Attend-Diffuse operator, which can be plugged into different types of pretrained diffusion models used in a joint diffusion setting.
Specifically, we merge the diffusion paths, reprogramming self- and cross-attention to operate on the aggregated latent space.
Our method maintains compatibility with the input prompt and visual quality of the generated images while increasing their semantic coherence.
arXiv Detail & Related papers (2024-08-28T09:22:32Z) - Toward a Diffusion-Based Generalist for Dense Vision Tasks [141.03236279493686]
Recent works have shown image itself can be used as a natural interface for general-purpose visual perception.
We propose to perform diffusion in pixel space and provide a recipe for finetuning pre-trained text-to-image diffusion models for dense vision tasks.
In experiments, we evaluate our method on four different types of tasks and show competitive performance to the other vision generalists.
arXiv Detail & Related papers (2024-06-29T17:57:22Z) - Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis [65.7968515029306]
We propose a novel Coarse-to-Fine Latent Diffusion (CFLD) method for Pose-Guided Person Image Synthesis (PGPIS)
A perception-refined decoder is designed to progressively refine a set of learnable queries and extract semantic understanding of person images as a coarse-grained prompt.
arXiv Detail & Related papers (2024-02-28T06:07:07Z) - On the Multi-modal Vulnerability of Diffusion Models [56.08923332178462]
We propose MMP-Attack to manipulate the generation results of diffusion models by appending a specific suffix to the original prompt.
Our goal is to induce diffusion models to generate a specific object while simultaneously eliminating the original object.
arXiv Detail & Related papers (2024-02-02T12:39:49Z) - On Conditioning the Input Noise for Controlled Image Generation with
Diffusion Models [27.472482893004862]
Conditional image generation has paved the way for several breakthroughs in image editing, generating stock photos and 3-D object generation.
In this work, we explore techniques to condition diffusion models with carefully crafted input noise artifacts.
arXiv Detail & Related papers (2022-05-08T13:18:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.