Boundary Guided Learning-Free Semantic Control with Diffusion Models
- URL: http://arxiv.org/abs/2302.08357v3
- Date: Wed, 18 Oct 2023 16:35:49 GMT
- Title: Boundary Guided Learning-Free Semantic Control with Diffusion Models
- Authors: Ye Zhu, Yu Wu, Zhiwei Deng, Olga Russakovsky, Yan Yan
- Abstract summary: We present our BoundaryDiffusion method for efficient, effective and light-weight semantic control with frozen pre-trained DDMs.
We conduct extensive experiments on DPMs architectures (DDPM, iDDPM) and datasets (CelebA, CelebA-HQ, LSUN-church, LSUN-bedroom, AFHQ-dog) with different resolutions (64, 256)
- Score: 44.37803942479853
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Applying pre-trained generative denoising diffusion models (DDMs) for
downstream tasks such as image semantic editing usually requires either
fine-tuning DDMs or learning auxiliary editing networks in the existing
literature. In this work, we present our BoundaryDiffusion method for
efficient, effective and light-weight semantic control with frozen pre-trained
DDMs, without learning any extra networks. As one of the first learning-free
diffusion editing works, we start by seeking a comprehensive understanding of
the intermediate high-dimensional latent spaces by theoretically and
empirically analyzing their probabilistic and geometric behaviors in the Markov
chain. We then propose to further explore the critical step for editing in the
denoising trajectory that characterizes the convergence of a pre-trained DDM
and introduce an automatic search method. Last but not least, in contrast to
the conventional understanding that DDMs have relatively poor semantic
behaviors, we prove that the critical latent space we found already exhibits
semantic subspace boundaries at the generic level in unconditional DDMs, which
allows us to do controllable manipulation by guiding the denoising trajectory
towards the targeted boundary via a single-step operation. We conduct extensive
experiments on multiple DPMs architectures (DDPM, iDDPM) and datasets (CelebA,
CelebA-HQ, LSUN-church, LSUN-bedroom, AFHQ-dog) with different resolutions (64,
256), achieving superior or state-of-the-art performance in various task
scenarios (image semantic editing, text-based editing, unconditional semantic
control) to demonstrate the effectiveness.
Related papers
- Unsupervised Modality Adaptation with Text-to-Image Diffusion Models for Semantic Segmentation [54.96563068182733]
We propose Modality Adaptation with text-to-image Diffusion Models (MADM) for semantic segmentation task.
MADM utilizes text-to-image diffusion models pre-trained on extensive image-text pairs to enhance the model's cross-modality capabilities.
We show that MADM achieves state-of-the-art adaptation performance across various modality tasks, including images to depth, infrared, and event modalities.
arXiv Detail & Related papers (2024-10-29T03:49:40Z) - SCA: Highly Efficient Semantic-Consistent Unrestricted Adversarial Attack [29.744970741737376]
We propose a novel framework called Semantic-Consistent Unrestricted Adversarial Attacks (SCA)
SCA employs an inversion method to extract edit-friendly noise maps and utilizes Multimodal Large Language Model (MLLM) to provide semantic guidance.
Our framework enables the efficient generation of adversarial examples that exhibit minimal discernible semantic changes.
arXiv Detail & Related papers (2024-10-03T06:25:53Z) - Enabling Local Editing in Diffusion Models by Joint and Individual Component Analysis [18.755311950243737]
The latent space of Diffusion Models (DMs) is not as well understood as that of Generative Adversarial Networks (GANs)
Recent research has focused on unsupervised semantic discovery in the latent space of DMs.
We introduce an unsupervised method to factorize the latent semantics learned by the denoising network of pre-trained DMs.
arXiv Detail & Related papers (2024-08-29T18:21:50Z) - Unified Domain Adaptive Semantic Segmentation [96.74199626935294]
Unsupervised Adaptive Domain Semantic (UDA-SS) aims to transfer the supervision from a labeled source domain to an unlabeled target domain.
We propose a Quad-directional Mixup (QuadMix) method, characterized by tackling distinct point attributes and feature inconsistencies.
Our method outperforms the state-of-the-art works by large margins on four challenging UDA-SS benchmarks.
arXiv Detail & Related papers (2023-11-22T09:18:49Z) - Unsupervised Discovery of Interpretable Directions in h-space of
Pre-trained Diffusion Models [63.1637853118899]
We propose the first unsupervised and learning-based method to identify interpretable directions in h-space of pre-trained diffusion models.
We employ a shift control module that works on h-space of pre-trained diffusion models to manipulate a sample into a shifted version of itself.
By jointly optimizing them, the model will spontaneously discover disentangled and interpretable directions.
arXiv Detail & Related papers (2023-10-15T18:44:30Z) - Denoising Task Routing for Diffusion Models [19.373733104929325]
Diffusion models generate highly realistic images by learning a multi-step denoising process.
Despite the inherent connection between diffusion models and multi-task learning (MTL), there remains an unexplored area in designing neural architectures.
We present Denoising Task Routing (DTR), a simple add-on strategy for existing diffusion model architectures to establish distinct information pathways.
arXiv Detail & Related papers (2023-10-11T02:23:18Z) - DeNoising-MOT: Towards Multiple Object Tracking with Severe Occlusions [52.63323657077447]
We propose DNMOT, an end-to-end trainable DeNoising Transformer for multiple object tracking.
Specifically, we augment the trajectory with noises during training and make our model learn the denoising process in an encoder-decoder architecture.
We conduct extensive experiments on the MOT17, MOT20, and DanceTrack datasets, and the experimental results show that our method outperforms previous state-of-the-art methods by a clear margin.
arXiv Detail & Related papers (2023-09-09T04:40:01Z) - CamoDiffusion: Camouflaged Object Detection via Conditional Diffusion
Models [72.93652777646233]
Camouflaged Object Detection (COD) is a challenging task in computer vision due to the high similarity between camouflaged objects and their surroundings.
We propose a new paradigm that treats COD as a conditional mask-generation task leveraging diffusion models.
Our method, dubbed CamoDiffusion, employs the denoising process of diffusion models to iteratively reduce the noise of the mask.
arXiv Detail & Related papers (2023-05-29T07:49:44Z) - Discovering Interpretable Directions in the Semantic Latent Space of Diffusion Models [21.173910627285338]
Denoising Diffusion Models (DDMs) have emerged as a strong competitor to Generative Adversarial Networks (GANs)
In this paper, we explore the properties of h-space and propose several novel methods for finding meaningful semantic directions within it.
Our approaches are applicable without requiring architectural modifications, text-based guidance, CLIP-based optimization, or model fine-tuning.
arXiv Detail & Related papers (2023-03-20T12:59:32Z) - Semi-supervised Domain Adaptation for Semantic Segmentation [3.946367634483361]
We propose a novel two-step semi-supervised dual-domain adaptation (SSDDA) approach to address both cross- and intra-domain gaps in semantic segmentation.
We demonstrate that the proposed approach outperforms state-of-the-art methods on two common synthetic-to-real semantic segmentation benchmarks.
arXiv Detail & Related papers (2021-10-20T16:13:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.