Mask Factory: Towards High-quality Synthetic Data Generation for Dichotomous Image Segmentation
- URL: http://arxiv.org/abs/2412.19080v1
- Date: Thu, 26 Dec 2024 06:37:25 GMT
- Title: Mask Factory: Towards High-quality Synthetic Data Generation for Dichotomous Image Segmentation
- Authors: Haotian Qian, YD Chen, Shengtao Lou, Fahad Shahbaz Khan, Xiaogang Jin, Deng-Ping Fan,
- Abstract summary: Dichotomous Image (DIS) tasks require highly precise annotations.
Current generative models and techniques struggle with the issues of scene deviations, noise-induced errors, and limited training sample variability.
We introduce a novel approach, which provides a scalable solution for generating diverse and precise datasets.
- Score: 70.95380821618711
- License:
- Abstract: Dichotomous Image Segmentation (DIS) tasks require highly precise annotations, and traditional dataset creation methods are labor intensive, costly, and require extensive domain expertise. Although using synthetic data for DIS is a promising solution to these challenges, current generative models and techniques struggle with the issues of scene deviations, noise-induced errors, and limited training sample variability. To address these issues, we introduce a novel approach, \textbf{\ourmodel{}}, which provides a scalable solution for generating diverse and precise datasets, markedly reducing preparation time and costs. We first introduce a general mask editing method that combines rigid and non-rigid editing techniques to generate high-quality synthetic masks. Specially, rigid editing leverages geometric priors from diffusion models to achieve precise viewpoint transformations under zero-shot conditions, while non-rigid editing employs adversarial training and self-attention mechanisms for complex, topologically consistent modifications. Then, we generate pairs of high-resolution image and accurate segmentation mask using a multi-conditional control generation method. Finally, our experiments on the widely-used DIS5K dataset benchmark demonstrate superior performance in quality and efficiency compared to existing methods. The code is available at \url{https://qian-hao-tian.github.io/MaskFactory/}.
Related papers
- Free-Mask: A Novel Paradigm of Integration Between the Segmentation Diffusion Model and Image Editing to Improve Segmentation Ability [5.767984430681467]
We propose a framework textbfFree-Mask that combines a Diffusion Model for segmentation with advanced image editing capabilities.
Results show that textbfFree-Mask achieves new state-of-the-art results on previously unseen classes in the VOC 2012 benchmark.
arXiv Detail & Related papers (2024-11-04T05:39:01Z) - Unsupervised Modality Adaptation with Text-to-Image Diffusion Models for Semantic Segmentation [54.96563068182733]
We propose Modality Adaptation with text-to-image Diffusion Models (MADM) for semantic segmentation task.
MADM utilizes text-to-image diffusion models pre-trained on extensive image-text pairs to enhance the model's cross-modality capabilities.
We show that MADM achieves state-of-the-art adaptation performance across various modality tasks, including images to depth, infrared, and event modalities.
arXiv Detail & Related papers (2024-10-29T03:49:40Z) - Adapting Diffusion Models for Improved Prompt Compliance and Controllable Image Synthesis [43.481539150288434]
This work introduces a new family of.
factor graph Diffusion Models (FG-DMs)
FG-DMs models the joint distribution of.
images and conditioning variables, such as semantic, sketch,.
deep or normal maps via a factor graph decomposition.
arXiv Detail & Related papers (2024-10-29T00:54:00Z) - High-Precision Dichotomous Image Segmentation via Probing Diffusion Capacity [69.32473738284374]
We propose DiffDIS, a diffusion-driven segmentation model that taps into the potential of the pre-trained U-Net within diffusion models.
By leveraging the robust generalization capabilities and rich, versatile image representation prior to the SD models, we significantly reduce the inference time while preserving high-fidelity, detailed generation.
Experiments on the DIS5K dataset demonstrate the superiority of DiffDIS, achieving state-of-the-art results through a streamlined inference process.
arXiv Detail & Related papers (2024-10-14T02:49:23Z) - Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation [49.827306773992376]
Continual Test-Time Adaptation (CTTA) is proposed to migrate a source pre-trained model to continually changing target distributions.
Our proposed method attains state-of-the-art performance in both classification and segmentation CTTA tasks.
arXiv Detail & Related papers (2023-12-19T15:34:52Z) - GD-MAE: Generative Decoder for MAE Pre-training on LiDAR Point Clouds [72.60362979456035]
Masked Autoencoders (MAE) are challenging to explore in large-scale 3D point clouds.
We propose a textbfGenerative textbfDecoder for MAE (GD-MAE) to automatically merges the surrounding context.
We demonstrate the efficacy of the proposed method on several large-scale benchmarks: KITTI, and ONCE.
arXiv Detail & Related papers (2022-12-06T14:32:55Z) - One-Shot Synthesis of Images and Segmentation Masks [28.119303696418882]
Joint synthesis of images and segmentation masks with generative adversarial networks (GANs) is promising to reduce the effort needed for collecting image data with pixel-wise annotations.
To learn high-fidelity image-mask synthesis, existing GAN approaches first need a pre-training phase requiring large amounts of image data.
We introduce our OSMIS model which enables the synthesis of segmentation masks that are precisely aligned to the generated images in the one-shot regime.
arXiv Detail & Related papers (2022-09-15T18:00:55Z) - Calibrated Hyperspectral Image Reconstruction via Graph-based
Self-Tuning Network [40.71031760929464]
Hyperspectral imaging (HSI) has attracted increasing research attention, especially for the ones based on a coded snapshot spectral imaging (CASSI) system.
Existing deep HSI reconstruction models are generally trained on paired data to retrieve original signals upon 2D compressed measurements given by a particular optical hardware mask in CASSI.
This mask-specific training style will lead to a hardware miscalibration issue, which sets up barriers to deploying deep HSI models among different hardware and noisy environments.
We propose a novel Graph-based Self-Tuning ( GST) network to reason uncertainties adapting to varying spatial structures of masks among
arXiv Detail & Related papers (2021-12-31T09:39:13Z) - Meta Mask Correction for Nuclei Segmentation in Histopathological Image [5.36728433027615]
We propose a novel meta-learning-based nuclei segmentation method to leverage data with noisy masks.
Specifically, we design a fully conventional meta-model that can correct noisy masks using a small amount of clean meta-data.
We show that our method achieves the state-of-the-art result.
arXiv Detail & Related papers (2021-11-24T13:53:35Z) - LevelSet R-CNN: A Deep Variational Method for Instance Segmentation [79.20048372891935]
Currently, many state of the art models are based on the Mask R-CNN framework.
We propose LevelSet R-CNN, which combines the best of both worlds by obtaining powerful feature representations.
We demonstrate the effectiveness of our approach on COCO and Cityscapes datasets.
arXiv Detail & Related papers (2020-07-30T17:52:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.