SegGen: Supercharging Segmentation Models with Text2Mask and Mask2Img Synthesis
- URL: http://arxiv.org/abs/2311.03355v2
- Date: Thu, 4 Jul 2024 18:59:18 GMT
- Title: SegGen: Supercharging Segmentation Models with Text2Mask and Mask2Img Synthesis
- Authors: Hanrong Ye, Jason Kuen, Qing Liu, Zhe Lin, Brian Price, Dan Xu,
- Abstract summary: SegGen is a highly-effective training data generation method for image segmentation.
MaskSyn synthesizes new mask-image pairs via proposed text-to-mask generation model and mask-to-image generation model.
ImgSyn synthesizes new images based on existing masks using the mask-to-image generation model.
- Score: 36.76548097887539
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose SegGen, a highly-effective training data generation method for image segmentation, which pushes the performance limits of state-of-the-art segmentation models to a significant extent. SegGen designs and integrates two data generation strategies: MaskSyn and ImgSyn. (i) MaskSyn synthesizes new mask-image pairs via our proposed text-to-mask generation model and mask-to-image generation model, greatly improving the diversity in segmentation masks for model supervision; (ii) ImgSyn synthesizes new images based on existing masks using the mask-to-image generation model, strongly improving image diversity for model inputs. On the highly competitive ADE20K and COCO benchmarks, our data generation method markedly improves the performance of state-of-the-art segmentation models in semantic segmentation, panoptic segmentation, and instance segmentation. Notably, in terms of the ADE20K mIoU, Mask2Former R50 is largely boosted from 47.2 to 49.9 (+2.7); Mask2Former Swin-L is also significantly increased from 56.1 to 57.4 (+1.3). These promising results strongly suggest the effectiveness of our SegGen even when abundant human-annotated training data is utilized. Moreover, training with our synthetic data makes the segmentation models more robust towards unseen domains. Project website: https://seggenerator.github.io
Related papers
- DiffuMask-Editor: A Novel Paradigm of Integration Between the Segmentation Diffusion Model and Image Editing to Improve Segmentation Ability [5.767984430681467]
This paper introduces DiffuMask-Editor, which combines the Diffusion Model for annotated datasets with Image Editing.
By integrating multiple objects into images using Text2Image models, our method facilitates the creation of more realistic datasets.
Results demonstrate that synthetic data generated by DiffuMask-Editor enable segmentation methods to achieve superior performance compared to real data.
arXiv Detail & Related papers (2024-11-04T05:39:01Z) - SegRefiner: Towards Model-Agnostic Segmentation Refinement with Discrete
Diffusion Process [102.18226145874007]
We propose a model-agnostic solution called SegRefiner to enhance the quality of object masks produced by different segmentation models.
SegRefiner takes coarse masks as inputs and refines them using a discrete diffusion process.
It consistently improves both the segmentation metrics and boundary metrics across different types of coarse masks.
arXiv Detail & Related papers (2023-12-19T18:53:47Z) - UniGS: Unified Representation for Image Generation and Segmentation [105.08152635402858]
We use a colormap to represent entity-level masks, addressing the challenge of varying entity numbers.
Two novel modules, including the location-aware color palette and progressive dichotomy module, are proposed to support our mask representation.
arXiv Detail & Related papers (2023-12-04T15:59:27Z) - FreeMask: Synthetic Images with Dense Annotations Make Stronger
Segmentation Models [62.009002395326384]
FreeMask resorts to synthetic images from generative models to ease the burden of data collection and annotation procedures.
We first synthesize abundant training images conditioned on the semantic masks provided by realistic datasets.
We investigate the role of synthetic images by joint training with real images, or pre-training for real images.
arXiv Detail & Related papers (2023-10-23T17:57:27Z) - DiffuMask: Synthesizing Images with Pixel-level Annotations for Semantic
Segmentation Using Diffusion Models [68.21154597227165]
We show that it is possible to automatically obtain accurate semantic masks of synthetic images generated by the Off-the-shelf Stable Diffusion model.
Our approach, called DiffuMask, exploits the potential of the cross-attention map between text and image.
arXiv Detail & Related papers (2023-03-21T08:43:15Z) - One-Shot Synthesis of Images and Segmentation Masks [28.119303696418882]
Joint synthesis of images and segmentation masks with generative adversarial networks (GANs) is promising to reduce the effort needed for collecting image data with pixel-wise annotations.
To learn high-fidelity image-mask synthesis, existing GAN approaches first need a pre-training phase requiring large amounts of image data.
We introduce our OSMIS model which enables the synthesis of segmentation masks that are precisely aligned to the generated images in the one-shot regime.
arXiv Detail & Related papers (2022-09-15T18:00:55Z) - Mask DINO: Towards A Unified Transformer-based Framework for Object
Detection and Segmentation [15.826822450977271]
Mask DINO is a unified object detection and segmentation framework.
Mask DINO is simple, efficient, scalable, and benefits from joint large-scale detection and segmentation datasets.
arXiv Detail & Related papers (2022-06-06T17:57:25Z) - Segmenter: Transformer for Semantic Segmentation [79.9887988699159]
We introduce Segmenter, a transformer model for semantic segmentation.
We build on the recent Vision Transformer (ViT) and extend it to semantic segmentation.
It outperforms the state of the art on the challenging ADE20K dataset and performs on-par on Pascal Context and Cityscapes.
arXiv Detail & Related papers (2021-05-12T13:01:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.