Related papers: DatasetAgent: A Novel Multi-Agent System for Auto-Constructing Datasets from Real-World Images

DatasetAgent: A Novel Multi-Agent System for Auto-Constructing Datasets from Real-World Images

URL: http://arxiv.org/abs/2507.08648v1
Date: Fri, 11 Jul 2025 14:51:33 GMT
Title: DatasetAgent: A Novel Multi-Agent System for Auto-Constructing Datasets from Real-World Images
Authors: Haoran Sun, Haoyu Bian, Shaoning Zeng, Yunbo Rao, Xu Xu, Lin Mei, Jianping Gou,
Abstract summary: We propose a novel method for auto-constructing datasets from real-world images by a multiagent collaborative system.<n>By coordinating four different agents equipped with Multi-modal Large Language Models (MLLMs), we are able to construct high-quality image datasets.<n>In particular, two types of experiments are conducted, including expanding existing datasets and creating new ones from scratch.
Score: 21.22466658711056
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Common knowledge indicates that the process of constructing image datasets usually depends on the time-intensive and inefficient method of manual collection and annotation. Large models offer a solution via data generation. Nonetheless, real-world data are obviously more valuable comparing to artificially intelligence generated data, particularly in constructing image datasets. For this reason, we propose a novel method for auto-constructing datasets from real-world images by a multiagent collaborative system, named as DatasetAgent. By coordinating four different agents equipped with Multi-modal Large Language Models (MLLMs), as well as a tool package for image optimization, DatasetAgent is able to construct high-quality image datasets according to user-specified requirements. In particular, two types of experiments are conducted, including expanding existing datasets and creating new ones from scratch, on a variety of open-source datasets. In both cases, multiple image datasets constructed by DatasetAgent are used to train various vision models for image classification, object detection, and image segmentation.

Related papers

Picking the Cream of the Crop: Visual-Centric Data Selection with Collaborative Agents [62.616106562146776]
We propose a textbfVisual-Centric textbfSelection approach via textbfAgents Collaboration (ViSA)<n>Our approach consists of 1) an image information quantification method via visual agents collaboration to select images with rich visual information, and 2) a visual-centric instruction quality assessment method to select high-quality instruction data related to high-quality images.
arXiv Detail & Related papers (2025-02-27T09:37:30Z)
Community Forensics: Using Thousands of Generators to Train Fake Image Detectors [15.166026536032142]
One of the key challenges of detecting AI-generated images is spotting images that have been created by previously unseen generative models.<n>We propose a new dataset that is significantly larger and more diverse than prior work.<n>The resulting dataset contains 2.7M images that have been sampled from 4803 different models.
arXiv Detail & Related papers (2024-11-06T18:59:41Z)
ARMADA: Attribute-Based Multimodal Data Augmentation [93.05614922383822]
Attribute-based Multimodal Data Augmentation (ARMADA) is a novel multimodal data augmentation method via knowledge-guided manipulation of visual attributes. ARMADA is a novel multimodal data generation framework that: (i) extracts knowledge-grounded attributes from symbolic KBs for semantically consistent yet distinctive image-text pair generation. This also highlights the need to leverage external knowledge proxies for enhanced interpretability and real-world grounding.
arXiv Detail & Related papers (2024-08-19T15:27:25Z)
Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models [49.439311430360284]
We introduce a novel data synthesis method inspired by contrastive learning and image difference captioning.<n>Our key idea involves challenging the model to discern both matching and distinct elements.<n>We leverage this generated dataset to fine-tune state-of-the-art (SOTA) MLLMs.
arXiv Detail & Related papers (2024-08-08T17:10:16Z)
AEye: A Visualization Tool for Image Datasets [18.95453617434051]
AEye is a semantically meaningful visualization tool tailored to image datasets. AEye embeds images into semantically meaningful high-dimensional representations, facilitating data clustering and organization. AEye facilitates semantic search functionalities for both text and image queries, enabling users to search for content.
arXiv Detail & Related papers (2024-08-07T20:19:20Z)
Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation [58.09421301921607]
We construct the first large-scale dataset for subject-driven image editing and generation. Our dataset is 5 times the size of previous largest dataset, yet our cost is tens of thousands of GPU hours lower.
arXiv Detail & Related papers (2024-06-13T16:40:39Z)
Mixed-Query Transformer: A Unified Image Segmentation Architecture [57.32212654642384]
Existing unified image segmentation models either employ a unified architecture across multiple tasks but use separate weights tailored to each dataset, or apply a single set of weights to multiple datasets but are limited to a single task. We introduce the Mixed-Query Transformer (MQ-Former), a unified architecture for multi-task and multi-dataset image segmentation using a single set of weights.
arXiv Detail & Related papers (2024-04-06T01:54:17Z)
A Multimodal Approach for Cross-Domain Image Retrieval [5.5547914920738]
Cross-Domain Image Retrieval (CDIR) is a challenging task in computer vision. This paper introduces a novel unsupervised approach to CDIR that incorporates textual context by leveraging pre-trained vision-language models. Our method, dubbed as Caption-Matching (CM), uses generated image captions as a domain-agnostic intermediate representation.
arXiv Detail & Related papers (2024-03-22T12:08:16Z)
JourneyDB: A Benchmark for Generative Image Understanding [89.02046606392382]
We introduce a comprehensive dataset, referred to as JourneyDB, that caters to the domain of generative images. Our meticulously curated dataset comprises 4 million distinct and high-quality generated images. On our dataset, we have devised four benchmarks to assess the performance of generated image comprehension.
arXiv Detail & Related papers (2023-07-03T02:39:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.