Zippo: Zipping Color and Transparency Distributions into a Single Diffusion Model
- URL: http://arxiv.org/abs/2403.11077v2
- Date: Tue, 19 Mar 2024 08:21:02 GMT
- Title: Zippo: Zipping Color and Transparency Distributions into a Single Diffusion Model
- Authors: Kangyang Xie, Binbin Yang, Hao Chen, Meng Wang, Cheng Zou, Hui Xue, Ming Yang, Chunhua Shen,
- Abstract summary: We present Zippo, a unified framework for zipping the color and transparency distributions into a single diffusion model.
Zippo is capable of generating RGB images from alpha mattes and predicting transparency from input images.
Our experiments showcase Zippo's ability of efficient text-conditioned transparent image generation.
- Score: 56.84435912555532
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Beyond the superiority of the text-to-image diffusion model in generating high-quality images, recent studies have attempted to uncover its potential for adapting the learned semantic knowledge to visual perception tasks. In this work, instead of translating a generative diffusion model into a visual perception model, we explore to retain the generative ability with the perceptive adaptation. To accomplish this, we present Zippo, a unified framework for zipping the color and transparency distributions into a single diffusion model by expanding the diffusion latent into a joint representation of RGB images and alpha mattes. By alternatively selecting one modality as the condition and then applying the diffusion process to the counterpart modality, Zippo is capable of generating RGB images from alpha mattes and predicting transparency from input images. In addition to single-modality prediction, we propose a modality-aware noise reassignment strategy to further empower Zippo with jointly generating RGB images and its corresponding alpha mattes under the text guidance. Our experiments showcase Zippo's ability of efficient text-conditioned transparent image generation and present plausible results of Matte-to-RGB and RGB-to-Matte translation.
Related papers
- Generalizable Origin Identification for Text-Guided Image-to-Image Diffusion Models [39.234894330025114]
Text-guided image-to-image diffusion models excel in translating images based on textual prompts.
This motivates us to introduce the task of origin IDentification for text-guided Image-to-image Diffusion models (ID$2$)
A straightforward solution to ID$2$ involves training a specialized deep embedding model to extract and compare features from both query and reference images.
arXiv Detail & Related papers (2025-01-04T20:34:53Z) - Retinex-Diffusion: On Controlling Illumination Conditions in Diffusion Models via Retinex Theory [19.205929427075965]
We conceptualize the diffusion model as a black-box image render and strategically decompose its energy function in alignment with the image formation model.
It generates images with realistic illumination effects, including cast shadow, soft shadow, and inter-reflections.
arXiv Detail & Related papers (2024-07-29T03:15:07Z) - LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation? [10.72249123249003]
We revisit diffusion models, highlighting their capacity for holistic context modeling and parallel decoding.
We introduce a novel architecture, LaDiC, which utilizes a split BERT to create a dedicated latent space for captions.
LaDiC achieves state-of-the-art performance for diffusion-based methods on the MS dataset with 38.2 BLEU@4 and 126.2 CIDEr.
arXiv Detail & Related papers (2024-04-16T17:47:16Z) - Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis [65.7968515029306]
We propose a novel Coarse-to-Fine Latent Diffusion (CFLD) method for Pose-Guided Person Image Synthesis (PGPIS)
A perception-refined decoder is designed to progressively refine a set of learnable queries and extract semantic understanding of person images as a coarse-grained prompt.
arXiv Detail & Related papers (2024-02-28T06:07:07Z) - Transparent Image Layer Diffusion using Latent Transparency [30.77316047044662]
We present LayerDiffuse, an approach enabling large-scale pretrained latent diffusion models to generate transparent images.
The method learns a "latent transparency" that encodes alpha channel transparency into the latent manifold of a pretrained latent diffusion model.
It preserves the production-ready quality of the large diffusion model by regulating the added transparency as a latent offset.
arXiv Detail & Related papers (2024-02-27T01:19:53Z) - JoReS-Diff: Joint Retinex and Semantic Priors in Diffusion Model for Low-light Image Enhancement [69.6035373784027]
Low-light image enhancement (LLIE) has achieved promising performance by employing conditional diffusion models.
Previous methods may neglect the importance of a sufficient formulation of task-specific condition strategy.
We propose JoReS-Diff, a novel approach that incorporates Retinex- and semantic-based priors as the additional pre-processing condition.
arXiv Detail & Related papers (2023-12-20T08:05:57Z) - Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional
Image Synthesis [62.07413805483241]
Steered Diffusion is a framework for zero-shot conditional image generation using a diffusion model trained for unconditional generation.
We present experiments using steered diffusion on several tasks including inpainting, colorization, text-guided semantic editing, and image super-resolution.
arXiv Detail & Related papers (2023-09-30T02:03:22Z) - MaskDiffusion: Boosting Text-to-Image Consistency with Conditional Mask [84.84034179136458]
A crucial factor leading to the text-image mismatch issue is the inadequate cross-modality relation learning.
We propose an adaptive mask, which is conditioned on the attention maps and the prompt embeddings, to dynamically adjust the contribution of each text token to the image features.
Our method, termed MaskDiffusion, is training-free and hot-pluggable for popular pre-trained diffusion models.
arXiv Detail & Related papers (2023-09-08T15:53:37Z) - Improved Diffusion-based Image Colorization via Piggybacked Models [19.807766482434563]
We introduce a colorization model piggybacking on the existing powerful T2I diffusion model.
A diffusion guider is designed to incorporate the pre-trained weights of the latent diffusion model.
A lightness-aware VQVAE will then generate the colorized result with pixel-perfect alignment to the given grayscale image.
arXiv Detail & Related papers (2023-04-21T16:23:24Z) - DDFM: Denoising Diffusion Model for Multi-Modality Image Fusion [144.9653045465908]
We propose a novel fusion algorithm based on the denoising diffusion probabilistic model (DDPM)
Our approach yields promising fusion results in infrared-visible image fusion and medical image fusion.
arXiv Detail & Related papers (2023-03-13T04:06:42Z) - Controlled and Conditional Text to Image Generation with Diffusion Prior [1.8690858882873838]
DALLE-2's two step process comprises a Diffusion Prior that generates a CLIP image embedding from text and a Diffusion Decoder that generates an image from a CLIP image embedding.
We show quantitatively and qualitatively that the proposed approaches perform better than prompt engineering for domain specific generation and existing baselines for color conditioned generation.
arXiv Detail & Related papers (2023-02-23T00:10:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.