Image Augmentation Agent for Weakly Supervised Semantic Segmentation
- URL: http://arxiv.org/abs/2412.20439v1
- Date: Sun, 29 Dec 2024 11:32:55 GMT
- Title: Image Augmentation Agent for Weakly Supervised Semantic Segmentation
- Authors: Wangyu Wu, Xianglin Qiu, Siqi Song, Zhenhong Chen, Xiaowei Huang, Fei Ma, Jimin Xiao,
- Abstract summary: Weakly-supervised semantic segmentation (WSSS) has achieved remarkable progress using only image-level labels.
We introduce a novel approach called Image Augmentation Agent (IAA) which shows that it is possible to enhance WSSS from data generation perspective.
IAA mainly design an augmentation agent that leverages large language models (LLMs) and diffusion models to automatically generate additional images for WSSS.
- Score: 19.654959889052638
- License:
- Abstract: Weakly-supervised semantic segmentation (WSSS) has achieved remarkable progress using only image-level labels. However, most existing WSSS methods focus on designing new network structures and loss functions to generate more accurate dense labels, overlooking the limitations imposed by fixed datasets, which can constrain performance improvements. We argue that more diverse trainable images provides WSSS richer information and help model understand more comprehensive semantic pattern. Therefore in this paper, we introduce a novel approach called Image Augmentation Agent (IAA) which shows that it is possible to enhance WSSS from data generation perspective. IAA mainly design an augmentation agent that leverages large language models (LLMs) and diffusion models to automatically generate additional images for WSSS. In practice, to address the instability in prompt generation by LLMs, we develop a prompt self-refinement mechanism. It allow LLMs to re-evaluate the rationality of generated prompts to produce more coherent prompts. Additionally, we insert an online filter into diffusion generation process to dynamically ensure the quality and balance of generated images. Experimental results show that our method significantly surpasses state-of-the-art WSSS approaches on the PASCAL VOC 2012 and MS COCO 2014 datasets.
Related papers
- Global Semantic-Guided Sub-image Feature Weight Allocation in High-Resolution Large Vision-Language Models [50.98559225639266]
Sub-images with higher semantic relevance to the entire image encapsulate richer visual information for preserving the model's visual understanding ability.
Global Semantic-guided Weight Allocator (GSWA) module allocates weights to sub-images based on their relative information density.
SleighVL, a lightweight yet high-performing model, outperforms models with comparable parameters and remains competitive with larger models.
arXiv Detail & Related papers (2025-01-24T06:42:06Z) - Nested Diffusion Models Using Hierarchical Latent Priors [23.605302440082994]
We introduce nested diffusion models, an efficient and powerful hierarchical generative framework.
Our approach employs a series of diffusion models to progressively generate latent variables at different semantic levels.
To construct these latent variables, we leverage a pre-trained visual encoder, which learns strong semantic visual representations.
arXiv Detail & Related papers (2024-12-08T16:13:39Z) - Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach [56.55633052479446]
Web-scale visual entity recognition presents significant challenges due to the lack of clean, large-scale training data.
We propose a novel methodology to curate such a dataset, leveraging a multimodal large language model (LLM) for label verification, metadata generation, and rationale explanation.
Experiments demonstrate that models trained on this automatically curated data achieve state-of-the-art performance on web-scale visual entity recognition tasks.
arXiv Detail & Related papers (2024-10-31T06:55:24Z) - Boosting Few-Shot Detection with Large Language Models and Layout-to-Image Synthesis [1.1633929083694388]
We propose a framework for enhancing few-shot detection beyond state-of-the-art generative augmentation approaches.
We introduce our novel layout-aware CLIP score for sample ranking, enabling tight coupling between generated layouts and images.
With our approach, a YOLOX-S baseline is boosted by more than 140%, 50%, 35% in mAP on the COCO 5-,10-, and 30-shot settings.
arXiv Detail & Related papers (2024-10-09T12:57:45Z) - DALDA: Data Augmentation Leveraging Diffusion Model and LLM with Adaptive Guidance Scaling [6.7206291284535125]
We present an effective data augmentation framework leveraging the Large Language Model (LLM) and Diffusion Model (DM)
Our approach addresses the issue of increasing the diversity of synthetic images.
Our method produces synthetic images with enhanced diversity while maintaining adherence to the target distribution.
arXiv Detail & Related papers (2024-09-25T14:02:43Z) - Can Generative Models Improve Self-Supervised Representation Learning? [0.7999703756441756]
We introduce a framework that enriches the self-supervised learning (SSL) paradigm by utilizing generative models to produce semantically consistent image augmentations.
Our results show that our framework significantly enhances the quality of learned visual representations by up to 10% Top-1 accuracy in downstream tasks.
arXiv Detail & Related papers (2024-03-09T17:17:07Z) - Learned representation-guided diffusion models for large-image generation [58.192263311786824]
We introduce a novel approach that trains diffusion models conditioned on embeddings from self-supervised learning (SSL)
Our diffusion models successfully project these features back to high-quality histopathology and remote sensing images.
Augmenting real data by generating variations of real images improves downstream accuracy for patch-level and larger, image-scale classification tasks.
arXiv Detail & Related papers (2023-12-12T14:45:45Z) - SSMG: Spatial-Semantic Map Guided Diffusion Model for Free-form
Layout-to-Image Generation [68.42476385214785]
We propose a novel Spatial-Semantic Map Guided (SSMG) diffusion model that adopts the feature map, derived from the layout, as guidance.
SSMG achieves superior generation quality with sufficient spatial and semantic controllability compared to previous works.
We also propose the Relation-Sensitive Attention (RSA) and Location-Sensitive Attention (LSA) mechanisms.
arXiv Detail & Related papers (2023-08-20T04:09:12Z) - Interactive Data Synthesis for Systematic Vision Adaptation via
LLMs-AIGCs Collaboration [48.54002313329872]
We propose a new paradigm of annotated data expansion named as ChatGenImage.
The core idea behind it is to leverage the complementary strengths of diverse models to establish a highly effective and user-friendly pipeline for interactive data augmentation.
We present fascinating results obtained from our ChatGenImage framework and demonstrate the powerful potential of our synthetic data for systematic vision adaptation.
arXiv Detail & Related papers (2023-05-22T07:53:36Z) - Semantic Image Synthesis via Diffusion Models [159.4285444680301]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks.
Recent work on semantic image synthesis mainly follows the emphde facto Generative Adversarial Nets (GANs)
arXiv Detail & Related papers (2022-06-30T18:31:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.