Related papers: InstaDA: Augmenting Instance Segmentation Data with Dual-Agent System

InstaDA: Augmenting Instance Segmentation Data with Dual-Agent System

URL: http://arxiv.org/abs/2509.02973v1
Date: Wed, 03 Sep 2025 03:21:10 GMT
Title: InstaDA: Augmenting Instance Segmentation Data with Dual-Agent System
Authors: Xianbao Hou, Yonghao He, Zeyd Boukhers, John See, Hu Su, Wei Sui, Cong Yang,
Abstract summary: InstaDA is a training-free Dual-Agent system designed to augment instance segmentation datasets.<n>A Text-Agent (T-Agent) enhances data diversity through collaboration between large language models (LLMs) and diffusion models.<n>An Image-Agent (I-Agent) augments the training set by generating new conditioned instances on the training images.
Score: 15.916510585915406
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Acquiring high-quality instance segmentation data is challenging due to the labor-intensive nature of the annotation process and significant class imbalances within datasets. Recent studies have utilized the integration of Copy-Paste and diffusion models to create more diverse datasets. However, these studies often lack deep collaboration between large language models (LLMs) and diffusion models, and underutilize the rich information within the existing training data. To address these limitations, we propose InstaDA, a novel, training-free Dual-Agent system designed to augment instance segmentation datasets. First, we introduce a Text-Agent (T-Agent) that enhances data diversity through collaboration between LLMs and diffusion models. This agent features a novel Prompt Rethink mechanism, which iteratively refines prompts based on the generated images. This process not only fosters collaboration but also increases image utilization and optimizes the prompts themselves. Additionally, we present an Image-Agent (I-Agent) aimed at enriching the overall data distribution. This agent augments the training set by generating new instances conditioned on the training images. To ensure practicality and efficiency, both agents operate as independent and automated workflows, enhancing usability. Experiments conducted on the LVIS 1.0 validation set indicate that InstaDA achieves significant improvements, with an increase of +4.0 in box average precision (AP) and +3.3 in mask AP compared to the baseline. Furthermore, it outperforms the leading model, DiverGen, by +0.3 in box AP and +0.1 in mask AP, with a notable +0.7 gain in box AP on common categories and mask AP gains of +0.2 on common categories and +0.5 on frequent categories.

Related papers

Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents [85.02904078131682]
We introduce the agent data protocol (ADP), a light-weight representation language that serves as an "interlingua" between agent datasets.<n> ADP is expressive enough to capture a large variety of tasks, including API/tool use, browsing, coding, software engineering, and general agentic.<n>All code and data are released publicly, in the hope that ADP could help lower the barrier to standardized, scalable, and reproducible agent training.
arXiv Detail & Related papers (2025-10-28T17:53:13Z)
Efficient Federated Learning with Heterogeneous Data and Adaptive Dropout [62.73150122809138]
Federated Learning (FL) is a promising distributed machine learning approach that enables collaborative training of a global model using multiple edge devices.<n>We propose the FedDHAD FL framework, which comes with two novel methods: Dynamic Heterogeneous model aggregation (FedDH) and Adaptive Dropout (FedAD)<n>The combination of these two methods makes FedDHAD significantly outperform state-of-the-art solutions in terms of accuracy (up to 6.7% higher), efficiency (up to 2.02 times faster), and cost (up to 15.0% smaller)
arXiv Detail & Related papers (2025-07-14T16:19:00Z)
Ali-AUG: Innovative Approaches to Labeled Data Augmentation using One-Step Diffusion Model [0.14999444543328289]
Ali-AUG is a novel single-step diffusion model for efficient labeled data augmentation in industrial applications. Our method addresses the challenge of limited labeled data by generating synthetic, labeled images with precise feature insertion.
arXiv Detail & Related papers (2024-10-24T12:12:46Z)
DiverGen: Improving Instance Segmentation by Learning Wider Data Distribution with More Diverse Generative Data [48.31817189858086]
We argue that generative data can expand the data distribution that the model can learn, thus mitigating overfitting. We find that DiverGen significantly outperforms the strong model X-Paste, achieving +1.1 box AP and +1.1 mask AP across all categories, and +1.9 box AP and +2.5 mask AP for rare categories.
arXiv Detail & Related papers (2024-05-16T15:30:18Z)
An Extensible Framework for Open Heterogeneous Collaborative Perception [58.70875361688463]
Collaborative perception aims to mitigate the limitations of single-agent perception. In this paper, we introduce a new open heterogeneous problem: how to accommodate continually emerging new heterogeneous agent types into collaborative perception. We propose HEterogeneous ALliance (HEAL), a novel collaborative perception framework.
arXiv Detail & Related papers (2024-01-25T05:55:03Z)
MADiff: Offline Multi-agent Learning with Diffusion Models [79.18130544233794]
MADiff is a diffusion-based multi-agent learning framework.<n>It works as both a decentralized policy and a centralized controller.<n>Our experiments demonstrate that MADiff outperforms baseline algorithms across various multi-agent learning tasks.
arXiv Detail & Related papers (2023-05-27T02:14:09Z)
EPiDA: An Easy Plug-in Data Augmentation Framework for High Performance Text Classification [34.15923302216751]
We present an easy and plug-in data augmentation framework EPiDA to support effective text classification. EPiDA employs two mechanisms: relative entropy (REM) and conditional minimization entropy (CEM) to control data generation. EPiDA can support efficient and continuous data generation for effective classification training.
arXiv Detail & Related papers (2022-04-24T06:53:48Z)
Unsupervised Domain Adaptive Learning via Synthetic Data for Person Re-identification [101.1886788396803]
Person re-identification (re-ID) has gained more and more attention due to its widespread applications in video surveillance. Unfortunately, the mainstream deep learning methods still need a large quantity of labeled data to train models. In this paper, we develop a data collector to automatically generate synthetic re-ID samples in a computer game, and construct a data labeler to simultaneously annotate them.
arXiv Detail & Related papers (2021-09-12T15:51:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.