DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion
Models
- URL: http://arxiv.org/abs/2308.06160v2
- Date: Tue, 10 Oct 2023 03:59:41 GMT
- Title: DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion
Models
- Authors: Weijia Wu, Yuzhong Zhao, Hao Chen, Yuchao Gu, Rui Zhao, Yefei He, Hong
Zhou, Mike Zheng Shou, Chunhua Shen
- Abstract summary: We present a generic dataset generation model that can produce diverse synthetic images and perception annotations.
Our method builds upon the pre-trained diffusion model and extends text-guided image synthesis to perception data generation.
We show that the rich latent code of the diffusion model can be effectively decoded as accurate perception annotations using a decoder module.
- Score: 61.906934570771256
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Current deep networks are very data-hungry and benefit from training on
largescale datasets, which are often time-consuming to collect and annotate. By
contrast, synthetic data can be generated infinitely using generative models
such as DALL-E and diffusion models, with minimal effort and cost. In this
paper, we present DatasetDM, a generic dataset generation model that can
produce diverse synthetic images and the corresponding high-quality perception
annotations (e.g., segmentation masks, and depth). Our method builds upon the
pre-trained diffusion model and extends text-guided image synthesis to
perception data generation. We show that the rich latent code of the diffusion
model can be effectively decoded as accurate perception annotations using a
decoder module. Training the decoder only needs less than 1% (around 100
images) manually labeled images, enabling the generation of an infinitely large
annotated dataset. Then these synthetic data can be used for training various
perception models for downstream tasks. To showcase the power of the proposed
approach, we generate datasets with rich dense pixel-wise labels for a wide
range of downstream tasks, including semantic segmentation, instance
segmentation, and depth estimation. Notably, it achieves 1) state-of-the-art
results on semantic segmentation and instance segmentation; 2) significantly
more robust on domain generalization than using the real data alone; and
state-of-the-art results in zero-shot segmentation setting; and 3) flexibility
for efficient application and novel task composition (e.g., image editing). The
project website and code can be found at
https://weijiawu.github.io/DatasetDM_page/ and
https://github.com/showlab/DatasetDM, respectively
Related papers
- DataDream: Few-shot Guided Dataset Generation [90.09164461462365]
We propose a framework for synthesizing classification datasets that more faithfully represents the real data distribution.
DataDream fine-tunes LoRA weights for the image generation model on the few real images before generating the training data using the adapted model.
We then fine-tune LoRA weights for CLIP using the synthetic data to improve downstream image classification over previous approaches on a large variety of datasets.
arXiv Detail & Related papers (2024-07-15T17:10:31Z) - LiteNeXt: A Novel Lightweight ConvMixer-based Model with Self-embedding Representation Parallel for Medical Image Segmentation [2.0901574458380403]
We propose a new lightweight but efficient model, namely LiteNeXt, for medical image segmentation.
LiteNeXt is trained from scratch with small amount of parameters (0.71M) and Giga Floating Point Operations Per Second (0.42).
arXiv Detail & Related papers (2024-04-04T01:59:19Z) - DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control [68.14798033899955]
Large, pretrained latent diffusion models (LDMs) have demonstrated an extraordinary ability to generate creative content.
However, are they usable as large-scale data generators, e.g., to improve tasks in the perception stack, like semantic segmentation?
We investigate this question in the context of autonomous driving, and answer it with a resounding "yes"
arXiv Detail & Related papers (2023-12-05T18:34:12Z) - ScribbleGen: Generative Data Augmentation Improves Scribble-supervised Semantic Segmentation [10.225021032417589]
We propose ScribbleGen, a generative data augmentation method for scribble-supervised semantic segmentation.
We leverage a ControlNet diffusion model conditioned on semantic scribbles to produce high-quality training data.
We show that our framework significantly improves segmentation performance on small datasets, even surpassing fully-supervised segmentation.
arXiv Detail & Related papers (2023-11-28T13:44:33Z) - Diffusion-based Data Augmentation for Nuclei Image Segmentation [68.28350341833526]
We introduce the first diffusion-based augmentation method for nuclei segmentation.
The idea is to synthesize a large number of labeled images to facilitate training the segmentation model.
The experimental results show that by augmenting 10% labeled real dataset with synthetic samples, one can achieve comparable segmentation results.
arXiv Detail & Related papers (2023-10-22T06:16:16Z) - RADiff: Controllable Diffusion Models for Radio Astronomical Maps
Generation [6.128112213696457]
RADiff is a generative approach based on conditional diffusion models trained over an annotated radio dataset.
We show that it is possible to generate fully-synthetic image-annotation pairs to automatically augment any annotated dataset.
arXiv Detail & Related papers (2023-07-05T16:04:44Z) - Multi-dataset Pretraining: A Unified Model for Semantic Segmentation [97.61605021985062]
We propose a unified framework, termed as Multi-Dataset Pretraining, to take full advantage of the fragmented annotations of different datasets.
This is achieved by first pretraining the network via the proposed pixel-to-prototype contrastive loss over multiple datasets.
In order to better model the relationship among images and classes from different datasets, we extend the pixel level embeddings via cross dataset mixing.
arXiv Detail & Related papers (2021-06-08T06:13:11Z) - DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort [117.41383937100751]
Current deep networks are extremely data-hungry, benefiting from training on large-scale datasets.
We show how the GAN latent code can be decoded to produce a semantic segmentation of the image.
These generated datasets can then be used for training any computer vision architecture just as real datasets are.
arXiv Detail & Related papers (2021-04-13T20:08:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.