Open-vocabulary Object Segmentation with Diffusion Models
- URL: http://arxiv.org/abs/2301.05221v2
- Date: Thu, 10 Aug 2023 16:05:14 GMT
- Title: Open-vocabulary Object Segmentation with Diffusion Models
- Authors: Ziyi Li, Qinye Zhou, Xiaoyun Zhang, Ya Zhang, Yanfeng Wang, Weidi Xie
- Abstract summary: The goal of this paper is to extract the visual-language correspondence from a pre-trained text-to-image diffusion model, in the form of segmentation map.
We adopt the augmented diffusion model to build a synthetic semantic segmentation dataset, and show that, training a standard segmentation model on such dataset demonstrates competitive performance on the zero-shot segmentation(ZS3) benchmark.
- Score: 47.36233857830832
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The goal of this paper is to extract the visual-language correspondence from
a pre-trained text-to-image diffusion model, in the form of segmentation map,
i.e., simultaneously generating images and segmentation masks for the
corresponding visual entities described in the text prompt. We make the
following contributions: (i) we pair the existing Stable Diffusion model with a
novel grounding module, that can be trained to align the visual and textual
embedding space of the diffusion model with only a small number of object
categories; (ii) we establish an automatic pipeline for constructing a dataset,
that consists of {image, segmentation mask, text prompt} triplets, to train the
proposed grounding module; (iii) we evaluate the performance of open-vocabulary
grounding on images generated from the text-to-image diffusion model and show
that the module can well segment the objects of categories beyond seen ones at
training time; (iv) we adopt the augmented diffusion model to build a synthetic
semantic segmentation dataset, and show that, training a standard segmentation
model on such dataset demonstrates competitive performance on the zero-shot
segmentation(ZS3) benchmark, which opens up new opportunities for adopting the
powerful diffusion model for discriminative tasks.
Related papers
- Human-Object Interaction Detection Collaborated with Large Relation-driven Diffusion Models [65.82564074712836]
We introduce DIFfusionHOI, a new HOI detector shedding light on text-to-image diffusion models.
We first devise an inversion-based strategy to learn the expression of relation patterns between humans and objects in embedding space.
These learned relation embeddings then serve as textual prompts, to steer diffusion models generate images that depict specific interactions.
arXiv Detail & Related papers (2024-10-26T12:00:33Z) - Unleashing the Potential of the Diffusion Model in Few-shot Semantic Segmentation [56.87049651707208]
Few-shot Semantic has evolved into In-context tasks, morphing into a crucial element in assessing generalist segmentation models.
Our initial focus lies in understanding how to facilitate interaction between the query image and the support image, resulting in the proposal of a KV fusion method within the self-attention framework.
Based on our analysis, we establish a simple and effective framework named DiffewS, maximally retaining the original Latent Diffusion Model's generative framework.
arXiv Detail & Related papers (2024-10-03T10:33:49Z) - FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models [56.71672127740099]
We focus on the task of image segmentation, which is traditionally solved by training models on closed-vocabulary datasets.
We leverage different and relatively small-sized, open-source foundation models for zero-shot open-vocabulary segmentation.
Our approach (dubbed FreeSeg-Diff), which does not rely on any training, outperforms many training-based approaches on both Pascal VOC and COCO datasets.
arXiv Detail & Related papers (2024-03-29T10:38:25Z) - Diffusion Model is Secretly a Training-free Open Vocabulary Semantic
Segmenter [47.29967666846132]
generative text-to-image diffusion models are highly efficient open-vocabulary semantic segmenters.
We introduce a novel training-free approach named DiffSegmenter to generate realistic objects that are semantically faithful to the input text.
Extensive experiments on three benchmark datasets show that the proposed DiffSegmenter achieves impressive results for open-vocabulary semantic segmentation.
arXiv Detail & Related papers (2023-09-06T06:31:08Z) - Diffusion Models for Open-Vocabulary Segmentation [79.02153797465324]
OVDiff is a novel method that leverages generative text-to-image diffusion models for unsupervised open-vocabulary segmentation.
It relies solely on pre-trained components and outputs the synthesised segmenter directly, without training.
arXiv Detail & Related papers (2023-06-15T17:51:28Z) - SceneGenie: Scene Graph Guided Diffusion Models for Image Synthesis [38.22195812238951]
We propose a novel guidance approach for the sampling process in the diffusion model.
Our approach guides the model with semantic features from CLIP embeddings and enforces geometric constraints.
Our results demonstrate the effectiveness of incorporating bounding box and segmentation map guidance in the diffusion model sampling process.
arXiv Detail & Related papers (2023-04-28T00:14:28Z) - Diffusion Models for Implicit Image Segmentation Ensembles [1.444701913511243]
We present a novel semantic segmentation method based on diffusion models.
By modifying the training and sampling scheme, we show that diffusion models can perform lesion segmentation of medical images.
Compared to state-of-the-art segmentation models, our approach yields good segmentation results and, additionally, meaningful uncertainty maps.
arXiv Detail & Related papers (2021-12-06T16:28:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.