Panoptic Diffusion Models: co-generation of images and segmentation maps
- URL: http://arxiv.org/abs/2412.02929v1
- Date: Wed, 04 Dec 2024 00:42:15 GMT
- Title: Panoptic Diffusion Models: co-generation of images and segmentation maps
- Authors: Yinghan Long, Kaushik Roy,
- Abstract summary: We present Panoptic Diffusion Model (PDM), the first model designed to generate both images and panoptic segmentation maps concurrently.
PDM bridges the gap between image and text by constructing segmentation layouts that provide detailed, built-in guidance throughout the generation process.
- Score: 7.573297026523597
- License:
- Abstract: Recently, diffusion models have demonstrated impressive capabilities in text-guided and image-conditioned image generation. However, existing diffusion models cannot simultaneously generate a segmentation map of objects and a corresponding image from the prompt. Previous attempts either generate segmentation maps based on the images or provide maps as input conditions to control image generation, limiting their functionality to given inputs. Incorporating an inherent understanding of the scene layouts can improve the creativity and realism of diffusion models. To address this limitation, we present Panoptic Diffusion Model (PDM), the first model designed to generate both images and panoptic segmentation maps concurrently. PDM bridges the gap between image and text by constructing segmentation layouts that provide detailed, built-in guidance throughout the generation process. This ensures the inclusion of categories mentioned in text prompts and enriches the diversity of segments within the background. We demonstrate the effectiveness of PDM across two architectures: a unified diffusion transformer and a two-stream transformer with a pretrained backbone. To facilitate co-generation with fewer sampling steps, we incorporate a fast diffusion solver into PDM. Additionally, when ground-truth maps are available, PDM can function as a text-guided image-to-image generation model. Finally, we propose a novel metric for evaluating the quality of generated maps and show that PDM achieves state-of-the-art results in image generation with implicit scene control.
Related papers
- Diffusion-driven GAN Inversion for Multi-Modal Face Image Generation [41.341693150031546]
We present a new multi-modal face image generation method that converts a text prompt and a visual input, such as a semantic mask or map, into a photo-realistic face image.
We present a simple mapping and a style modulation network to link two models and convert meaningful representations in feature maps and attention maps into latent codes.
Our proposed network produces realistic 2D, multi-view, and stylized face images, which align well with inputs.
arXiv Detail & Related papers (2024-05-07T14:33:40Z) - On the Multi-modal Vulnerability of Diffusion Models [56.08923332178462]
We propose MMP-Attack to manipulate the generation results of diffusion models by appending a specific suffix to the original prompt.
Our goal is to induce diffusion models to generate a specific object while simultaneously eliminating the original object.
arXiv Detail & Related papers (2024-02-02T12:39:49Z) - EmerDiff: Emerging Pixel-level Semantic Knowledge in Diffusion Models [52.3015009878545]
We develop an image segmentor capable of generating fine-grained segmentation maps without any additional training.
Our framework identifies semantic correspondences between image pixels and spatial locations of low-dimensional feature maps.
In extensive experiments, the produced segmentation maps are demonstrated to be well delineated and capture detailed parts of the images.
arXiv Detail & Related papers (2024-01-22T07:34:06Z) - Adversarial Supervision Makes Layout-to-Image Diffusion Models Thrive [21.49096276631859]
Current L2I models either suffer from poor editability via text or weak alignment between the generated image and the input layout.
We propose to integrate adversarial supervision into the conventional training pipeline of L2I diffusion models (ALDM)
Specifically, we employ a segmentation-based discriminator which provides explicit feedback to the diffusion generator on the pixel-level alignment between the denoised image and the input layout.
arXiv Detail & Related papers (2024-01-16T20:31:46Z) - R&B: Region and Boundary Aware Zero-shot Grounded Text-to-image
Generation [74.5598315066249]
We probe into zero-shot grounded T2I generation with diffusion models.
We propose a Region and Boundary (R&B) aware cross-attention guidance approach.
arXiv Detail & Related papers (2023-10-13T05:48:42Z) - BLIP-Diffusion: Pre-trained Subject Representation for Controllable
Text-to-Image Generation and Editing [73.74570290836152]
BLIP-Diffusion is a new subject-driven image generation model that supports multimodal control.
Unlike other subject-driven generation models, BLIP-Diffusion introduces a new multimodal encoder which is pre-trained to provide subject representation.
arXiv Detail & Related papers (2023-05-24T04:51:04Z) - SceneGenie: Scene Graph Guided Diffusion Models for Image Synthesis [38.22195812238951]
We propose a novel guidance approach for the sampling process in the diffusion model.
Our approach guides the model with semantic features from CLIP embeddings and enforces geometric constraints.
Our results demonstrate the effectiveness of incorporating bounding box and segmentation map guidance in the diffusion model sampling process.
arXiv Detail & Related papers (2023-04-28T00:14:28Z) - A Structure-Guided Diffusion Model for Large-Hole Image Completion [85.61681358977266]
We develop a structure-guided diffusion model to fill large holes in images.
Our method achieves a superior or comparable visual quality compared to state-of-the-art approaches.
arXiv Detail & Related papers (2022-11-18T18:59:01Z) - SegDiff: Image Segmentation with Diffusion Probabilistic Models [81.16986859755038]
Diffusion Probabilistic Methods are employed for state-of-the-art image generation.
We present a method for extending such models for performing image segmentation.
The method learns end-to-end, without relying on a pre-trained backbone.
arXiv Detail & Related papers (2021-12-01T10:17:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.