Related papers: Free-Mask: A Novel Paradigm of Integration Between the Segmentation Diffusion Model and Image Editing to Improve Segmentation Ability

Free-Mask: A Novel Paradigm of Integration Between the Segmentation Diffusion Model and Image Editing to Improve Segmentation Ability

URL: http://arxiv.org/abs/2411.01819v2
Date: Mon, 02 Dec 2024 14:42:09 GMT
Title: Free-Mask: A Novel Paradigm of Integration Between the Segmentation Diffusion Model and Image Editing to Improve Segmentation Ability
Authors: Bo Gao, Fangxu Xing, Daniel Tang,
Abstract summary: We propose a framework textbfFree-Mask that combines a Diffusion Model for segmentation with advanced image editing capabilities.<n>Results show that textbfFree-Mask achieves new state-of-the-art results on previously unseen classes in the VOC 2012 benchmark.
Score: 5.767984430681467
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Current semantic segmentation models typically require a substantial amount of manually annotated data, a process that is both time-consuming and resource-intensive. Alternatively, leveraging advanced text-to-image models such as Midjourney and Stable Diffusion has emerged as an efficient strategy, enabling the automatic generation of synthetic data in place of manual annotations. However, previous methods have been limited to generating single-instance images, as the generation of multiple instances with Stable Diffusion has proven unstable. To address this limitation and expand the scope and diversity of synthetic datasets, we propose a framework \textbf{Free-Mask} that combines a Diffusion Model for segmentation with advanced image editing capabilities, allowing for the integration of multiple objects into images via text-to-image models. Our method facilitates the creation of highly realistic datasets that closely emulate open-world environments while generating accurate segmentation masks. It reduces the labor associated with manual annotation and also ensures precise mask generation. Experimental results demonstrate that synthetic data generated by \textbf{Free-Mask} enables segmentation models to outperform those trained on real data, especially in zero-shot settings. Notably, \textbf{Free-Mask} achieves new state-of-the-art results on previously unseen classes in the VOC 2012 benchmark.

Related papers

EliGen: Entity-Level Controlled Image Generation with Regional Attention [7.7120747804211405]
We present EliGen, a novel framework for entity-level controlled image Generation. We train EliGen to achieve robust and accurate entity-level manipulation, surpassing existing methods in both spatial precision and image quality. We propose an inpainting fusion pipeline, extending its capabilities to multi-entity image inpainting tasks.
arXiv Detail & Related papers (2025-01-02T06:46:13Z)
Mask Factory: Towards High-quality Synthetic Data Generation for Dichotomous Image Segmentation [70.95380821618711]
Dichotomous Image (DIS) tasks require highly precise annotations. Current generative models and techniques struggle with the issues of scene deviations, noise-induced errors, and limited training sample variability. We introduce a novel approach, which provides a scalable solution for generating diverse and precise datasets.
arXiv Detail & Related papers (2024-12-26T06:37:25Z)
FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models [56.71672127740099]
We focus on the task of image segmentation, which is traditionally solved by training models on closed-vocabulary datasets. We leverage different and relatively small-sized, open-source foundation models for zero-shot open-vocabulary segmentation. Our approach (dubbed FreeSeg-Diff), which does not rely on any training, outperforms many training-based approaches on both Pascal VOC and COCO datasets.
arXiv Detail & Related papers (2024-03-29T10:38:25Z)
Outline-Guided Object Inpainting with Diffusion Models [11.391452115311798]
Instance segmentation datasets play a crucial role in training accurate and robust computer vision models. We show how this issue can be mitigated by starting with small annotated instance segmentation datasets and augmenting them to obtain a sizeable annotated dataset. We generate new images using a diffusion-based inpainting model to fill out the masked area with a desired object class by guiding the diffusion through the object outline.
arXiv Detail & Related papers (2024-02-26T09:21:17Z)
UniGS: Unified Representation for Image Generation and Segmentation [105.08152635402858]
We use a colormap to represent entity-level masks, addressing the challenge of varying entity numbers. Two novel modules, including the location-aware color palette and progressive dichotomy module, are proposed to support our mask representation.
arXiv Detail & Related papers (2023-12-04T15:59:27Z)
MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation [104.03166324080917]
We present MosaicFusion, a simple yet effective diffusion-based data augmentation approach for large vocabulary instance segmentation. Our method is training-free and does not rely on any label supervision. Experimental results on the challenging LVIS long-tailed and open-vocabulary benchmarks demonstrate that MosaicFusion can significantly improve the performance of existing instance segmentation models.
arXiv Detail & Related papers (2023-09-22T17:59:42Z)
Ref-Diff: Zero-shot Referring Image Segmentation with Generative Models [68.73086826874733]
We introduce a novel Referring Diffusional segmentor (Ref-Diff) for referring image segmentation. We demonstrate that without a proposal generator, a generative model alone can achieve comparable performance to existing SOTA weakly-supervised models. This indicates that generative models are also beneficial for this task and can complement discriminative models for better referring segmentation.
arXiv Detail & Related papers (2023-08-31T14:55:30Z)
DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion Models [61.906934570771256]
We present a generic dataset generation model that can produce diverse synthetic images and perception annotations. Our method builds upon the pre-trained diffusion model and extends text-guided image synthesis to perception data generation. We show that the rich latent code of the diffusion model can be effectively decoded as accurate perception annotations using a decoder module.
arXiv Detail & Related papers (2023-08-11T14:38:11Z)
MixReorg: Cross-Modal Mixed Patch Reorganization is a Good Mask Learner for Open-World Semantic Segmentation [110.09800389100599]
We propose MixReorg, a novel and straightforward pre-training paradigm for semantic segmentation. Our approach involves generating fine-grained patch-text pairs data by mixing image patches while preserving the correspondence between patches and text. With MixReorg as a mask learner, conventional text-supervised semantic segmentation models can achieve highly generalizable pixel-semantic alignment ability.
arXiv Detail & Related papers (2023-08-09T09:35:16Z)
DiffuMask: Synthesizing Images with Pixel-level Annotations for Semantic Segmentation Using Diffusion Models [68.21154597227165]
We show that it is possible to automatically obtain accurate semantic masks of synthetic images generated by the Off-the-shelf Stable Diffusion model. Our approach, called DiffuMask, exploits the potential of the cross-attention map between text and image.
arXiv Detail & Related papers (2023-03-21T08:43:15Z)
Foreground-Background Separation through Concept Distillation from Generative Image Foundation Models [6.408114351192012]
We present a novel method that enables the generation of general foreground-background segmentation models from simple textual descriptions. We show results on the task of segmenting four different objects (humans, dogs, cars, birds) and a use case scenario in medical image analysis.
arXiv Detail & Related papers (2022-12-29T13:51:54Z)
DiffEdit: Diffusion-based semantic image editing with mask guidance [64.555930158319]
DiffEdit is a method to take advantage of text-conditioned diffusion models for the task of semantic image editing. Our main contribution is able to automatically generate a mask highlighting regions of the input image that need to be edited.
arXiv Detail & Related papers (2022-10-20T17:16:37Z)
Scaling up instance annotation via label propagation [69.8001043244044]
We propose a highly efficient annotation scheme for building large datasets with object segmentation masks. We exploit these similarities by using hierarchical clustering on mask predictions made by a segmentation model. We show that we obtain 1M object segmentation masks with a total annotation time of only 290 hours.
arXiv Detail & Related papers (2021-10-05T18:29:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.