Data augmentation with automated machine learning: approaches and
performance comparison with classical data augmentation methods
- URL: http://arxiv.org/abs/2403.08352v1
- Date: Wed, 13 Mar 2024 09:00:38 GMT
- Title: Data augmentation with automated machine learning: approaches and
performance comparison with classical data augmentation methods
- Authors: Alhassan Mumuni and Fuseini Mumuni
- Abstract summary: State-of-the-art approaches typically rely on automated machine learning (AutoML) principles.
This work presents a comprehensive survey of AutoML-based data augmentation techniques.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Data augmentation is arguably the most important regularization technique
commonly used to improve generalization performance of machine learning models.
It primarily involves the application of appropriate data transformation
operations to create new data samples with desired properties. Despite its
effectiveness, the process is often challenging because of the time-consuming
trial and error procedures for creating and testing different candidate
augmentations and their hyperparameters manually. Automated data augmentation
methods aim to automate the process. State-of-the-art approaches typically rely
on automated machine learning (AutoML) principles. This work presents a
comprehensive survey of AutoML-based data augmentation techniques. We discuss
various approaches for accomplishing data augmentation with AutoML, including
data manipulation, data integration and data synthesis techniques. We present
extensive discussion of techniques for realizing each of the major subtasks of
the data augmentation process: search space design, hyperparameter optimization
and model evaluation. Finally, we carried out an extensive comparison and
analysis of the performance of automated data augmentation techniques and
state-of-the-art methods based on classical augmentation approaches. The
results show that AutoML methods for data augmentation currently outperform
state-of-the-art techniques based on conventional approaches.
Related papers
- Augmentation Policy Generation for Image Classification Using Large Language Models [3.038642416291856]
We propose a strategy that uses large language models to automatically generate efficient augmentation policies.
The proposed method was evaluated on medical imaging datasets, showing a clear improvement over state-of-the-art methods.
arXiv Detail & Related papers (2024-10-17T11:26:10Z) - A Comprehensive Survey on Data Augmentation [55.355273602421384]
Data augmentation is a technique that generates high-quality artificial data by manipulating existing data samples.
Existing literature surveys only focus on a certain type of specific modality data.
We propose a more enlightening taxonomy that encompasses data augmentation techniques for different common data modalities.
arXiv Detail & Related papers (2024-05-15T11:58:08Z) - Automated data processing and feature engineering for deep learning and big data applications: a survey [0.0]
Modern approach to artificial intelligence (AI) aims to design algorithms that learn directly from data.
Not all data processing tasks in conventional deep learning pipelines have been automated.
arXiv Detail & Related papers (2024-03-18T01:07:48Z) - AutoCure: Automated Tabular Data Curation Technique for ML Pipelines [0.0]
We present AutoCure, a novel and configuration-free data curation pipeline.
Unlike traditional data curation methods, AutoCure synthetically enhances the density of the clean data fraction.
In practice, AutoCure can be integrated with open source tools to promote the democratization of machine learning.
arXiv Detail & Related papers (2023-04-26T15:51:47Z) - AutoEn: An AutoML method based on ensembles of predefined Machine
Learning pipelines for supervised Traffic Forecasting [1.6242924916178283]
Traffic Forecasting (TF) is gaining relevance due to its ability to mitigate traffic congestion by forecasting future traffic states.
TF poses one big challenge to the Machine Learning paradigm, known as the Model Selection Problem (MSP)
We introduce AutoEn, which is a simple and efficient method for automatically generating multi-classifier ensembles from a predefined set of ML pipelines.
arXiv Detail & Related papers (2023-03-19T18:37:18Z) - OmniForce: On Human-Centered, Large Model Empowered and Cloud-Edge
Collaborative AutoML System [85.8338446357469]
We introduce OmniForce, a human-centered AutoML system that yields both human-assisted ML and ML-assisted human techniques.
We show how OmniForce can put an AutoML system into practice and build adaptive AI in open-environment scenarios.
arXiv Detail & Related papers (2023-03-01T13:35:22Z) - Automatic Data Augmentation via Invariance-Constrained Learning [94.27081585149836]
Underlying data structures are often exploited to improve the solution of learning tasks.
Data augmentation induces these symmetries during training by applying multiple transformations to the input data.
This work tackles these issues by automatically adapting the data augmentation while solving the learning task.
arXiv Detail & Related papers (2022-09-29T18:11:01Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - Automated Machine Learning Techniques for Data Streams [91.3755431537592]
This paper surveys the state-of-the-art open-source AutoML tools, applies them to data collected from streams, and measures how their performance changes over time.
The results show that off-the-shelf AutoML tools can provide satisfactory results but in the presence of concept drift, detection or adaptation techniques have to be applied to maintain the predictive accuracy over time.
arXiv Detail & Related papers (2021-06-14T11:42:46Z) - Adaptive Weighting Scheme for Automatic Time-Series Data Augmentation [79.47771259100674]
We present two sample-adaptive automatic weighting schemes for data augmentation.
We validate our proposed methods on a large, noisy financial dataset and on time-series datasets from the UCR archive.
On the financial dataset, we show that the methods in combination with a trading strategy lead to improvements in annualized returns of over 50$%$, and on the time-series data we outperform state-of-the-art models on over half of the datasets, and achieve similar performance in accuracy on the others.
arXiv Detail & Related papers (2021-02-16T17:50:51Z) - Improving the Performance of Fine-Grain Image Classifiers via Generative
Data Augmentation [0.5161531917413706]
We develop Data Augmentation from Proficient Pre-Training of Robust Generative Adrial Networks (DAPPER GAN)
DAPPER GAN is an ML analytics support tool that automatically generates novel views of training images.
We experimentally evaluate this technique on the Stanford Cars dataset, demonstrating improved vehicle make and model classification accuracy.
arXiv Detail & Related papers (2020-08-12T15:29:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.