SegGen: Supercharging Segmentation Models with Text2Mask and Mask2Img Synthesis
- URL: http://arxiv.org/abs/2311.03355v2
- Date: Thu, 4 Jul 2024 18:59:18 GMT
- Title: SegGen: Supercharging Segmentation Models with Text2Mask and Mask2Img Synthesis
- Authors: Hanrong Ye, Jason Kuen, Qing Liu, Zhe Lin, Brian Price, Dan Xu,
- Abstract summary: SegGen is a highly-effective training data generation method for image segmentation.
MaskSyn synthesizes new mask-image pairs via proposed text-to-mask generation model and mask-to-image generation model.
ImgSyn synthesizes new images based on existing masks using the mask-to-image generation model.
- Score: 36.76548097887539
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose SegGen, a highly-effective training data generation method for image segmentation, which pushes the performance limits of state-of-the-art segmentation models to a significant extent. SegGen designs and integrates two data generation strategies: MaskSyn and ImgSyn. (i) MaskSyn synthesizes new mask-image pairs via our proposed text-to-mask generation model and mask-to-image generation model, greatly improving the diversity in segmentation masks for model supervision; (ii) ImgSyn synthesizes new images based on existing masks using the mask-to-image generation model, strongly improving image diversity for model inputs. On the highly competitive ADE20K and COCO benchmarks, our data generation method markedly improves the performance of state-of-the-art segmentation models in semantic segmentation, panoptic segmentation, and instance segmentation. Notably, in terms of the ADE20K mIoU, Mask2Former R50 is largely boosted from 47.2 to 49.9 (+2.7); Mask2Former Swin-L is also significantly increased from 56.1 to 57.4 (+1.3). These promising results strongly suggest the effectiveness of our SegGen even when abundant human-annotated training data is utilized. Moreover, training with our synthetic data makes the segmentation models more robust towards unseen domains. Project website: https://seggenerator.github.io
Related papers
- SimGen: A Diffusion-Based Framework for Simultaneous Surgical Image and Segmentation Mask Generation [1.9393128408121891]
generative AI models like text-to-image can alleviate data scarcity, incorporating spatial annotations, such as segmentation masks, is crucial for precision-driven surgical applications, simulation, and education.
This study introduces both a novel task and method, SimGen, for Simultaneous Image and Mask Generation.
SimGen is a diffusion model based on the DDPM framework and Residual U-Net, designed to jointly generate high-fidelity surgical images and their corresponding segmentation masks.
arXiv Detail & Related papers (2025-01-15T18:48:38Z) - Free-Mask: A Novel Paradigm of Integration Between the Segmentation Diffusion Model and Image Editing to Improve Segmentation Ability [5.767984430681467]
We propose a framework textbfFree-Mask that combines a Diffusion Model for segmentation with advanced image editing capabilities.
Results show that textbfFree-Mask achieves new state-of-the-art results on previously unseen classes in the VOC 2012 benchmark.
arXiv Detail & Related papers (2024-11-04T05:39:01Z) - SegRefiner: Towards Model-Agnostic Segmentation Refinement with Discrete
Diffusion Process [102.18226145874007]
We propose a model-agnostic solution called SegRefiner to enhance the quality of object masks produced by different segmentation models.
SegRefiner takes coarse masks as inputs and refines them using a discrete diffusion process.
It consistently improves both the segmentation metrics and boundary metrics across different types of coarse masks.
arXiv Detail & Related papers (2023-12-19T18:53:47Z) - UniGS: Unified Representation for Image Generation and Segmentation [105.08152635402858]
We use a colormap to represent entity-level masks, addressing the challenge of varying entity numbers.
Two novel modules, including the location-aware color palette and progressive dichotomy module, are proposed to support our mask representation.
arXiv Detail & Related papers (2023-12-04T15:59:27Z) - FreeMask: Synthetic Images with Dense Annotations Make Stronger
Segmentation Models [62.009002395326384]
FreeMask resorts to synthetic images from generative models to ease the burden of data collection and annotation procedures.
We first synthesize abundant training images conditioned on the semantic masks provided by realistic datasets.
We investigate the role of synthetic images by joint training with real images, or pre-training for real images.
arXiv Detail & Related papers (2023-10-23T17:57:27Z) - DiffuMask: Synthesizing Images with Pixel-level Annotations for Semantic
Segmentation Using Diffusion Models [68.21154597227165]
We show that it is possible to automatically obtain accurate semantic masks of synthetic images generated by the Off-the-shelf Stable Diffusion model.
Our approach, called DiffuMask, exploits the potential of the cross-attention map between text and image.
arXiv Detail & Related papers (2023-03-21T08:43:15Z) - One-Shot Synthesis of Images and Segmentation Masks [28.119303696418882]
Joint synthesis of images and segmentation masks with generative adversarial networks (GANs) is promising to reduce the effort needed for collecting image data with pixel-wise annotations.
To learn high-fidelity image-mask synthesis, existing GAN approaches first need a pre-training phase requiring large amounts of image data.
We introduce our OSMIS model which enables the synthesis of segmentation masks that are precisely aligned to the generated images in the one-shot regime.
arXiv Detail & Related papers (2022-09-15T18:00:55Z) - Mask DINO: Towards A Unified Transformer-based Framework for Object
Detection and Segmentation [15.826822450977271]
Mask DINO is a unified object detection and segmentation framework.
Mask DINO is simple, efficient, scalable, and benefits from joint large-scale detection and segmentation datasets.
arXiv Detail & Related papers (2022-06-06T17:57:25Z) - Segmenter: Transformer for Semantic Segmentation [79.9887988699159]
We introduce Segmenter, a transformer model for semantic segmentation.
We build on the recent Vision Transformer (ViT) and extend it to semantic segmentation.
It outperforms the state of the art on the challenging ADE20K dataset and performs on-par on Pascal Context and Cityscapes.
arXiv Detail & Related papers (2021-05-12T13:01:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.