Automated Image Recognition Framework
- URL: http://arxiv.org/abs/2506.19261v1
- Date: Tue, 24 Jun 2025 02:42:34 GMT
- Title: Automated Image Recognition Framework
- Authors: Quang-Binh Nguyen, Trong-Vu Hoang, Ngoc-Do Tran, Tam V. Nguyen, Minh-Triet Tran, Trung-Nghia Le,
- Abstract summary: We propose a novel Automated Image Recognition framework that harnesses the power of generative AI.<n>AIR empowers end-users to synthesize high-quality, pre-annotated datasets.<n>It also automatically trains deep learning models on the generated datasets with robust image recognition performance.
- Score: 14.338537127280402
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: While the efficacy of deep learning models heavily relies on data, gathering and annotating data for specific tasks, particularly when addressing novel or sensitive subjects lacking relevant datasets, poses significant time and resource challenges. In response to this, we propose a novel Automated Image Recognition (AIR) framework that harnesses the power of generative AI. AIR empowers end-users to synthesize high-quality, pre-annotated datasets, eliminating the necessity for manual labeling. It also automatically trains deep learning models on the generated datasets with robust image recognition performance. Our framework includes two main data synthesis processes, AIR-Gen and AIR-Aug. The AIR-Gen enables end-users to seamlessly generate datasets tailored to their specifications. To improve image quality, we introduce a novel automated prompt engineering module that leverages the capabilities of large language models. We also introduce a distribution adjustment algorithm to eliminate duplicates and outliers, enhancing the robustness and reliability of generated datasets. On the other hand, the AIR-Aug enhances a given dataset, thereby improving the performance of deep classifier models. AIR-Aug is particularly beneficial when users have limited data for specific tasks. Through comprehensive experiments, we demonstrated the efficacy of our generated data in training deep learning models and showcased the system's potential to provide image recognition models for a wide range of objects. We also conducted a user study that achieved an impressive score of 4.4 out of 5.0, underscoring the AI community's positive perception of AIR.
Related papers
- Enhancing Training Data Attribution with Representational Optimization [57.61977909113113]
Training data attribution methods aim to measure how training data impacts a model's predictions.<n>We propose AirRep, a representation-based approach that closes this gap by learning task-specific and model-aligned representations explicitly for TDA.<n>AirRep introduces two key innovations: a trainable encoder tuned for attribution quality, and an attention-based pooling mechanism that enables accurate estimation of group-wise influence.
arXiv Detail & Related papers (2025-05-24T05:17:53Z) - ToolACE-DEV: Self-Improving Tool Learning via Decomposition and EVolution [77.86222359025011]
We propose ToolACE-DEV, a self-improving framework for tool learning.<n>First, we decompose the tool-learning objective into sub-tasks that enhance basic tool-making and tool-using abilities.<n>We then introduce a self-evolving paradigm that allows lightweight models to self-improve, reducing reliance on advanced LLMs.
arXiv Detail & Related papers (2025-05-12T12:48:30Z) - OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis [55.390060529534644]
We propose OS-Genesis, a novel data synthesis pipeline for Graphical User Interface (GUI) agents.<n>Instead of relying on pre-defined tasks, OS-Genesis enables agents first to perceive environments and perform step-wise interactions.<n>We demonstrate that training GUI agents with OS-Genesis significantly improves their performance on highly challenging online benchmarks.
arXiv Detail & Related papers (2024-12-27T16:21:58Z) - Adaptive Masking Enhances Visual Grounding [12.793586888511978]
We propose IMAGE, Interpretative MAsking with Gaussian radiation modEling, to enhance vocabulary grounding in low-shot learning scenarios.
We evaluate the efficacy of our approach on benchmark datasets, including COCO and ODinW, demonstrating its superior performance in zero-shot and few-shot tasks.
arXiv Detail & Related papers (2024-10-04T05:48:02Z) - A Simple Background Augmentation Method for Object Detection with Diffusion Model [53.32935683257045]
In computer vision, it is well-known that a lack of data diversity will impair model performance.
We propose a simple yet effective data augmentation approach by leveraging advancements in generative models.
Background augmentation, in particular, significantly improves the models' robustness and generalization capabilities.
arXiv Detail & Related papers (2024-08-01T07:40:00Z) - Generative Expansion of Small Datasets: An Expansive Graph Approach [13.053285552524052]
We introduce an Expansive Synthesis model generating large-scale, information-rich datasets from minimal samples.
An autoencoder with self-attention layers and optimal transport refines distributional consistency.
Results show comparable performance, demonstrating the model's potential to augment training data effectively.
arXiv Detail & Related papers (2024-06-25T02:59:02Z) - AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving [68.73885845181242]
We propose an Automatic Data Engine (AIDE) that automatically identifies issues, efficiently curates data, improves the model through auto-labeling, and verifies the model through generation of diverse scenarios.
We further establish a benchmark for open-world detection on AV datasets to comprehensively evaluate various learning paradigms, demonstrating our method's superior performance at a reduced cost.
arXiv Detail & Related papers (2024-03-26T04:27:56Z) - DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception [78.26734070960886]
Current perceptive models heavily depend on resource-intensive datasets.
We introduce perception-aware loss (P.A. loss) through segmentation, improving both quality and controllability.
Our method customizes data augmentation by extracting and utilizing perception-aware attribute (P.A. Attr) during generation.
arXiv Detail & Related papers (2024-03-20T04:58:03Z) - GISTEmbed: Guided In-sample Selection of Training Negatives for Text
Embedding Fine-tuning [0.0]
GISTEmbed is a novel strategy that enhances in-batch negative selection during contrastive training through a guide model.
Benchmarked against the Massive Text Embedding Benchmark (MTEB), GISTEmbed showcases consistent performance improvements across various model sizes.
arXiv Detail & Related papers (2024-02-26T18:55:15Z) - Improving the Performance of Fine-Grain Image Classifiers via Generative
Data Augmentation [0.5161531917413706]
We develop Data Augmentation from Proficient Pre-Training of Robust Generative Adrial Networks (DAPPER GAN)
DAPPER GAN is an ML analytics support tool that automatically generates novel views of training images.
We experimentally evaluate this technique on the Stanford Cars dataset, demonstrating improved vehicle make and model classification accuracy.
arXiv Detail & Related papers (2020-08-12T15:29:11Z) - DQI: Measuring Data Quality in NLP [22.54066527822898]
We introduce a generic formula for Data Quality Index (DQI) to help dataset creators create datasets free of unwanted biases.
We show that models trained on the renovated SNLI dataset generalize better to out of distribution tasks.
arXiv Detail & Related papers (2020-05-02T12:34:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.