Related papers: Diffusion in Diffusion: Cyclic One-Way Diffusion for Text-Vision-Conditioned Generation

Diffusion in Diffusion: Cyclic One-Way Diffusion for Text-Vision-Conditioned Generation

URL: http://arxiv.org/abs/2306.08247v6
Date: Sat, 16 Mar 2024 05:45:59 GMT
Title: Diffusion in Diffusion: Cyclic One-Way Diffusion for Text-Vision-Conditioned Generation
Authors: Ruoyu Wang, Yongqi Yang, Zhihao Qian, Ye Zhu, Yu Wu,
Abstract summary: In this work, we investigate the diffusion (physics) in diffusion (machine learning) properties. We propose our Cyclic One-Way Diffusion (COW) method to control the direction of diffusion phenomenon. Our method provides a novel perspective to understand the task needs and is applicable to a wider range of customization scenarios.
Score: 11.80682025950519
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Originating from the diffusion phenomenon in physics that describes particle movement, the diffusion generative models inherit the characteristics of stochastic random walk in the data space along the denoising trajectory. However, the intrinsic mutual interference among image regions contradicts the need for practical downstream application scenarios where the preservation of low-level pixel information from given conditioning is desired (e.g., customization tasks like personalized generation and inpainting based on a user-provided single image). In this work, we investigate the diffusion (physics) in diffusion (machine learning) properties and propose our Cyclic One-Way Diffusion (COW) method to control the direction of diffusion phenomenon given a pre-trained frozen diffusion model for versatile customization application scenarios, where the low-level pixel information from the conditioning needs to be preserved. Notably, unlike most current methods that incorporate additional conditions by fine-tuning the base text-to-image diffusion model or learning auxiliary networks, our method provides a novel perspective to understand the task needs and is applicable to a wider range of customization scenarios in a learning-free manner. Extensive experiment results show that our proposed COW can achieve more flexible customization based on strict visual conditions in different application settings. Project page: https://wangruoyu02.github.io/cow.github.io/.

Related papers

BADiff: Bandwidth Adaptive Diffusion Model [55.10134744772338]
Traditional diffusion models produce high-fidelity images by performing a fixed number of denoising steps, regardless of downstream transmission limitations.<n>In practical cloud-to-device scenarios, limited bandwidth often necessitates heavy compression, leading to loss of fine textures and wasted computation.<n>We introduce a joint end-to-end training strategy where the diffusion model is conditioned on a target quality level derived from the available bandwidth.
arXiv Detail & Related papers (2025-10-24T11:50:03Z)
Unleashing the Potential of the Semantic Latent Space in Diffusion Models for Image Dehazing [25.138589492384654]
We propose a Diffusion Latent Inspired network for Image Dehazing, dubbed DiffLI$2$D.<n>We first reveal that the semantic latent space of pre-trained diffusion models can represent the content and haze characteristics of images.<n>We integrate the diffusion latent representations at different time-steps into a delicately designed dehazing network to provide instructions for image dehazing.
arXiv Detail & Related papers (2025-09-24T13:11:37Z)
Efficient Diversity-Preserving Diffusion Alignment via Gradient-Informed GFlowNets [65.42834731617226]
We propose a reinforcement learning method for diffusion model finetuning, dubbed Nabla-GFlowNet. We show that our proposed method achieves fast yet diversity- and prior-preserving finetuning of Stable Diffusion, a large-scale text-conditioned image diffusion model.
arXiv Detail & Related papers (2024-12-10T18:59:58Z)
SOWing Information: Cultivating Contextual Coherence with MLLMs in Image Generation [29.49217721233131]
diffusion generative models simulate a random walk in the data space along the denoising trajectory. This allows information to diffuse across regions, yielding outcomes. However, the chaotic and disordered nature of information diffusion in diffusion models often results in undesired interference between image regions, causing degraded detail preservation and contextual inconsistency. We reframing disordered diffusion as a powerful tool for text-vision-to-image generation (TV2I) tasks, achieving pixel-level condition fidelity while maintaining visual and semantic coherence throughout the image.
arXiv Detail & Related papers (2024-11-28T14:35:25Z)
Merging and Splitting Diffusion Paths for Semantically Coherent Panoramas [33.334956022229846]
We propose the Merge-Attend-Diffuse operator, which can be plugged into different types of pretrained diffusion models used in a joint diffusion setting. Specifically, we merge the diffusion paths, reprogramming self- and cross-attention to operate on the aggregated latent space. Our method maintains compatibility with the input prompt and visual quality of the generated images while increasing their semantic coherence.
arXiv Detail & Related papers (2024-08-28T09:22:32Z)
Immiscible Diffusion: Accelerating Diffusion Training with Noise Assignment [56.609042046176555]
suboptimal noise-data mapping leads to slow training of diffusion models. Drawing inspiration from the immiscibility phenomenon in physics, we propose Immiscible Diffusion. Our approach is remarkably simple, requiring only one line of code to restrict the diffuse-able area for each image.
arXiv Detail & Related papers (2024-06-18T06:20:42Z)
Improving GFlowNets for Text-to-Image Diffusion Alignment [48.42367859859971]
We explore techniques that do not directly maximize the reward but rather generate high-reward images with relatively high probability. Our method could effectively align large-scale text-to-image diffusion models with given reward information.
arXiv Detail & Related papers (2024-06-02T06:36:46Z)
Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional Image Synthesis [62.07413805483241]
Steered Diffusion is a framework for zero-shot conditional image generation using a diffusion model trained for unconditional generation. We present experiments using steered diffusion on several tasks including inpainting, colorization, text-guided semantic editing, and image super-resolution.
arXiv Detail & Related papers (2023-09-30T02:03:22Z)
Deceptive-NeRF/3DGS: Diffusion-Generated Pseudo-Observations for High-Quality Sparse-View Reconstruction [60.52716381465063]
We introduce Deceptive-NeRF/3DGS to enhance sparse-view reconstruction with only a limited set of input images. Specifically, we propose a deceptive diffusion model turning noisy images rendered from few-view reconstructions into high-quality pseudo-observations. Our system progressively incorporates diffusion-generated pseudo-observations into the training image sets, ultimately densifying the sparse input observations by 5 to 10 times.
arXiv Detail & Related papers (2023-05-24T14:00:32Z)
Enhanced Controllability of Diffusion Models via Feature Disentanglement and Realism-Enhanced Sampling Methods [27.014858633903867]
We present a training framework for feature disentanglement of Diffusion Models (FDiff) We propose two sampling methods that can boost the realism of our Diffusion Models and also enhance the controllability.
arXiv Detail & Related papers (2023-02-28T07:43:00Z)
ShiftDDPMs: Exploring Conditional Diffusion Models by Shifting Diffusion Trajectories [144.03939123870416]
We propose a novel conditional diffusion model by introducing conditions into the forward process. We use extra latent space to allocate an exclusive diffusion trajectory for each condition based on some shifting rules. We formulate our method, which we call textbfShiftDDPMs, and provide a unified point of view on existing related methods.
arXiv Detail & Related papers (2023-02-05T12:48:21Z)
SinDiffusion: Learning a Diffusion Model from a Single Natural Image [159.4285444680301]
We present SinDiffusion, leveraging denoising diffusion models to capture internal distribution of patches from a single natural image. It is based on two core designs. First, SinDiffusion is trained with a single model at a single scale instead of multiple models with progressive growing of scales. Second, we identify that a patch-level receptive field of the diffusion network is crucial and effective for capturing the image's patch statistics.
arXiv Detail & Related papers (2022-11-22T18:00:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.