Foreground-Background Separation through Concept Distillation from
Generative Image Foundation Models
- URL: http://arxiv.org/abs/2212.14306v2
- Date: Mon, 4 Sep 2023 07:44:31 GMT
- Title: Foreground-Background Separation through Concept Distillation from
Generative Image Foundation Models
- Authors: Mischa Dombrowski, Hadrien Reynaud, Matthew Baugh and Bernhard Kainz
- Abstract summary: We present a novel method that enables the generation of general foreground-background segmentation models from simple textual descriptions.
We show results on the task of segmenting four different objects (humans, dogs, cars, birds) and a use case scenario in medical image analysis.
- Score: 6.408114351192012
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Curating datasets for object segmentation is a difficult task. With the
advent of large-scale pre-trained generative models, conditional image
generation has been given a significant boost in result quality and ease of
use. In this paper, we present a novel method that enables the generation of
general foreground-background segmentation models from simple textual
descriptions, without requiring segmentation labels. We leverage and explore
pre-trained latent diffusion models, to automatically generate weak
segmentation masks for concepts and objects. The masks are then used to
fine-tune the diffusion model on an inpainting task, which enables fine-grained
removal of the object, while at the same time providing a synthetic foreground
and background dataset. We demonstrate that using this method beats previous
methods in both discriminative and generative performance and closes the gap
with fully supervised training while requiring no pixel-wise object labels. We
show results on the task of segmenting four different objects (humans, dogs,
cars, birds) and a use case scenario in medical image analysis. The code is
available at https://github.com/MischaD/fobadiffusion.
Related papers
- DiffUHaul: A Training-Free Method for Object Dragging in Images [78.93531472479202]
We propose a training-free method, dubbed DiffUHaul, for the object dragging task.
We first apply attention masking in each denoising step to make the generation more disentangled across different objects.
In the early denoising steps, we interpolate the attention features between source and target images to smoothly fuse new layouts with the original appearance.
arXiv Detail & Related papers (2024-06-03T17:59:53Z) - FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models [56.71672127740099]
We focus on the task of image segmentation, which is traditionally solved by training models on closed-vocabulary datasets.
We leverage different and relatively small-sized, open-source foundation models for zero-shot open-vocabulary segmentation.
Our approach (dubbed FreeSeg-Diff), which does not rely on any training, outperforms many training-based approaches on both Pascal VOC and COCO datasets.
arXiv Detail & Related papers (2024-03-29T10:38:25Z) - Outline-Guided Object Inpainting with Diffusion Models [11.391452115311798]
Instance segmentation datasets play a crucial role in training accurate and robust computer vision models.
We show how this issue can be mitigated by starting with small annotated instance segmentation datasets and augmenting them to obtain a sizeable annotated dataset.
We generate new images using a diffusion-based inpainting model to fill out the masked area with a desired object class by guiding the diffusion through the object outline.
arXiv Detail & Related papers (2024-02-26T09:21:17Z) - Dataset Diffusion: Diffusion-based Synthetic Dataset Generation for
Pixel-Level Semantic Segmentation [6.82236459614491]
We propose a novel method for generating pixel-level semantic segmentation labels using the text-to-image generative model Stable Diffusion.
By utilizing the text prompts, cross-attention, and self-attention of SD, we introduce three new techniques: class-prompt appending, class-prompt cross-attention, and self-attention exponentiation.
These techniques enable us to generate segmentation maps corresponding to synthetic images.
arXiv Detail & Related papers (2023-09-25T17:19:26Z) - Microscopy Image Segmentation via Point and Shape Regularized Data
Synthesis [9.47802391546853]
We develop a unified pipeline for microscopy image segmentation using synthetically generated training data.
Our framework achieves comparable results to models trained on authentic microscopy images with dense labels.
arXiv Detail & Related papers (2023-08-18T22:00:53Z) - DiffuMask: Synthesizing Images with Pixel-level Annotations for Semantic
Segmentation Using Diffusion Models [68.21154597227165]
We show that it is possible to automatically obtain accurate semantic masks of synthetic images generated by the Off-the-shelf Stable Diffusion model.
Our approach, called DiffuMask, exploits the potential of the cross-attention map between text and image.
arXiv Detail & Related papers (2023-03-21T08:43:15Z) - A Generalist Framework for Panoptic Segmentation of Images and Videos [61.61453194912186]
We formulate panoptic segmentation as a discrete data generation problem, without relying on inductive bias of the task.
A diffusion model is proposed to model panoptic masks, with a simple architecture and generic loss function.
Our method is capable of modeling video (in a streaming setting) and thereby learns to track object instances automatically.
arXiv Detail & Related papers (2022-10-12T16:18:25Z) - Discovering Object Masks with Transformers for Unsupervised Semantic
Segmentation [75.00151934315967]
MaskDistill is a novel framework for unsupervised semantic segmentation.
Our framework does not latch onto low-level image cues and is not limited to object-centric datasets.
arXiv Detail & Related papers (2022-06-13T17:59:43Z) - CYBORGS: Contrastively Bootstrapping Object Representations by Grounding
in Segmentation [22.89327564484357]
We propose a framework which accomplishes this goal via joint learning of representations and segmentation.
By iterating between these two components, we ground the contrastive updates in segmentation information, and simultaneously improve segmentation throughout pretraining.
arXiv Detail & Related papers (2022-03-17T14:20:05Z) - Rectifying the Shortcut Learning of Background: Shared Object
Concentration for Few-Shot Image Recognition [101.59989523028264]
Few-Shot image classification aims to utilize pretrained knowledge learned from a large-scale dataset to tackle a series of downstream classification tasks.
We propose COSOC, a novel Few-Shot Learning framework, to automatically figure out foreground objects at both pretraining and evaluation stage.
arXiv Detail & Related papers (2021-07-16T07:46:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.