Image Augmentation with Controlled Diffusion for Weakly-Supervised Semantic Segmentation
- URL: http://arxiv.org/abs/2310.09760v3
- Date: Tue, 26 Nov 2024 02:11:53 GMT
- Title: Image Augmentation with Controlled Diffusion for Weakly-Supervised Semantic Segmentation
- Authors: Wangyu Wu, Tianhong Dai, Xiaowei Huang, Fei Ma, Jimin Xiao,
- Abstract summary: We introduce a novel approach called Image Augmentation with Controlled Diffusion (IACD)
IACD effectively augments existing labeled datasets by generating diverse images through controlled diffusion.
We also propose a high-quality image selection strategy to mitigate the potential noise introduced by the randomness of diffusion models.
- Score: 23.888222298960542
- License:
- Abstract: Weakly-supervised semantic segmentation (WSSS), which aims to train segmentation models solely using image-level labels, has achieved significant attention. Existing methods primarily focus on generating high-quality pseudo labels using available images and their image-level labels. However, the quality of pseudo labels degrades significantly when the size of available dataset is limited. Thus, in this paper, we tackle this problem from a different view by introducing a novel approach called Image Augmentation with Controlled Diffusion (IACD). This framework effectively augments existing labeled datasets by generating diverse images through controlled diffusion, where the available images and image-level labels are served as the controlling information. Moreover, we also propose a high-quality image selection strategy to mitigate the potential noise introduced by the randomness of diffusion models. In the experiments, our proposed IACD approach clearly surpasses existing state-of-the-art methods. This effect is more obvious when the amount of available data is small, demonstrating the effectiveness of our method.
Related papers
- Unsupervised Modality Adaptation with Text-to-Image Diffusion Models for Semantic Segmentation [54.96563068182733]
We propose Modality Adaptation with text-to-image Diffusion Models (MADM) for semantic segmentation task.
MADM utilizes text-to-image diffusion models pre-trained on extensive image-text pairs to enhance the model's cross-modality capabilities.
We show that MADM achieves state-of-the-art adaptation performance across various modality tasks, including images to depth, infrared, and event modalities.
arXiv Detail & Related papers (2024-10-29T03:49:40Z) - HPL-ESS: Hybrid Pseudo-Labeling for Unsupervised Event-based Semantic Segmentation [47.271784693700845]
We propose a novel hybrid pseudo-labeling framework for unsupervised event-based semantic segmentation, HPL-ESS, to alleviate the influence of noisy pseudo labels.
Our proposed method outperforms existing state-of-the-art methods by a large margin on the DSEC-Semantic dataset.
arXiv Detail & Related papers (2024-03-25T14:02:33Z) - CLIP-Guided Source-Free Object Detection in Aerial Images [17.26407623526735]
High-resolution aerial images often require substantial storage space and may not be readily accessible to the public.
We propose a novel Source-Free Object Detection (SFOD) method to address these challenges.
To alleviate the noisy labels in self-training, we utilize Contrastive Language-Image Pre-training (CLIP) to guide the generation of pseudo-labels.
By leveraging CLIP's zero-shot classification capability, we aggregate its scores with the original predicted bounding boxes, enabling us to obtain refined scores for the pseudo-labels.
arXiv Detail & Related papers (2024-01-10T14:03:05Z) - Self-Guided Diffusion Models [53.825634944114285]
We propose a framework for self-guided diffusion models.
Our method provides guidance signals at various image granularities.
Our experiments on single-label and multi-label image datasets demonstrate that self-labeled guidance always outperforms diffusion models without guidance.
arXiv Detail & Related papers (2022-10-12T17:57:58Z) - Semantic Image Synthesis via Diffusion Models [159.4285444680301]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks.
Recent work on semantic image synthesis mainly follows the emphde facto Generative Adversarial Nets (GANs)
arXiv Detail & Related papers (2022-06-30T18:31:51Z) - Semi-weakly Supervised Contrastive Representation Learning for Retinal
Fundus Images [0.2538209532048867]
We propose a semi-weakly supervised contrastive learning framework for representation learning using semi-weakly annotated images.
We empirically validate the transfer learning performance of SWCL on seven public retinal fundus datasets.
arXiv Detail & Related papers (2021-08-04T15:50:09Z) - Pseudo Pixel-level Labeling for Images with Evolving Content [5.573543601558405]
We propose a pseudo-pixel-level label generation technique to reduce the amount of effort for manual annotation of images.
We train two semantic segmentation models with VGG and ResNet backbones on images labeled using our pseudo labeling method and those of a state-of-the-art method.
The results indicate that using our pseudo-labels instead of those generated using the state-of-the-art method in the training process improves the mean-IoU and the frequency-weighted-IoU of the VGG and ResNet-based semantic segmentation models by 3.36%, 2.58%, 10
arXiv Detail & Related papers (2021-05-20T18:14:19Z) - Semi-Supervised Domain Adaptation with Prototypical Alignment and
Consistency Learning [86.6929930921905]
This paper studies how much it can help address domain shifts if we further have a few target samples labeled.
To explore the full potential of landmarks, we incorporate a prototypical alignment (PA) module which calculates a target prototype for each class from the landmarks.
Specifically, we severely perturb the labeled images, making PA non-trivial to achieve and thus promoting model generalizability.
arXiv Detail & Related papers (2021-04-19T08:46:08Z) - SSKD: Self-Supervised Knowledge Distillation for Cross Domain Adaptive
Person Re-Identification [25.96221714337815]
Domain adaptive person re-identification (re-ID) is a challenging task due to the large discrepancy between the source domain and the target domain.
Existing methods mainly attempt to generate pseudo labels for unlabeled target images by clustering algorithms.
We propose a Self-Supervised Knowledge Distillation (SSKD) technique containing two modules, the identity learning and the soft label learning.
arXiv Detail & Related papers (2020-09-13T10:12:02Z) - Instance-Aware Graph Convolutional Network for Multi-Label
Classification [55.131166957803345]
Graph convolutional neural network (GCN) has effectively boosted the multi-label image recognition task.
We propose an instance-aware graph convolutional neural network (IA-GCN) framework for multi-label classification.
arXiv Detail & Related papers (2020-08-19T12:49:28Z) - Data-driven Meta-set Based Fine-Grained Visual Classification [61.083706396575295]
We propose a data-driven meta-set based approach to deal with noisy web images for fine-grained recognition.
Specifically, guided by a small amount of clean meta-set, we train a selection net in a meta-learning manner to distinguish in- and out-of-distribution noisy images.
arXiv Detail & Related papers (2020-08-06T03:04:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.