Comprehensive Exploration of Synthetic Data Generation: A Survey
- URL: http://arxiv.org/abs/2401.02524v2
- Date: Thu, 1 Feb 2024 22:06:51 GMT
- Title: Comprehensive Exploration of Synthetic Data Generation: A Survey
- Authors: Andr\'e Bauer, Simon Trapp, Michael Stenger, Robert Leppich, Samuel
Kounev, Mark Leznik, Kyle Chard, Ian Foster
- Abstract summary: This work surveys 417 Synthetic Data Generation models over the last decade.
The findings reveal increased model performance and complexity, with neural network-based approaches prevailing.
Computer vision dominates, with GANs as primary generative models, while diffusion models, transformers, and RNNs compete.
- Score: 4.485401662312072
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent years have witnessed a surge in the popularity of Machine Learning
(ML), applied across diverse domains. However, progress is impeded by the
scarcity of training data due to expensive acquisition and privacy legislation.
Synthetic data emerges as a solution, but the abundance of released models and
limited overview literature pose challenges for decision-making. This work
surveys 417 Synthetic Data Generation (SDG) models over the last decade,
providing a comprehensive overview of model types, functionality, and
improvements. Common attributes are identified, leading to a classification and
trend analysis. The findings reveal increased model performance and complexity,
with neural network-based approaches prevailing, except for privacy-preserving
data generation. Computer vision dominates, with GANs as primary generative
models, while diffusion models, transformers, and RNNs compete. Implications
from our performance evaluation highlight the scarcity of common metrics and
datasets, making comparisons challenging. Additionally, the neglect of training
and computational costs in literature necessitates attention in future
research. This work serves as a guide for SDG model selection and identifies
crucial areas for future exploration.
Related papers
- Exploring the Landscape for Generative Sequence Models for Specialized Data Synthesis [0.0]
This paper introduces a novel approach that leverages three generative models of varying complexity to synthesize Malicious Network Traffic.
Our approach transforms numerical data into text, re-framing data generation as a language modeling task.
Our method surpasses state-of-the-art generative models in producing high-fidelity synthetic data.
arXiv Detail & Related papers (2024-11-04T09:51:10Z) - zGAN: An Outlier-focused Generative Adversarial Network For Realistic Synthetic Data Generation [0.0]
"Black swans" have posed a challenge to performance of classical machine learning models.
This article provides an overview of the zGAN model architecture developed for the purpose of generating synthetic data with outlier characteristics.
It shows promising results on realistic synthetic data generation, as well as uplift capabilities vis-a-vis model performance.
arXiv Detail & Related papers (2024-10-28T07:55:11Z) - Deep Generative Models in Robotics: A Survey on Learning from Multimodal Demonstrations [52.11801730860999]
In recent years, the robot learning community has shown increasing interest in using deep generative models to capture the complexity of large datasets.
We present the different types of models that the community has explored, such as energy-based models, diffusion models, action value maps, or generative adversarial networks.
We also present the different types of applications in which deep generative models have been used, from grasp generation to trajectory generation or cost learning.
arXiv Detail & Related papers (2024-08-08T11:34:31Z) - From CNNs to Transformers in Multimodal Human Action Recognition: A Survey [23.674123304219822]
Human action recognition is one of the most widely studied research problems in Computer Vision.
Recent studies have shown that addressing it using multimodal data leads to superior performance.
Recent rise of Transformers in visual modelling is now also causing a paradigm shift for the action recognition task.
arXiv Detail & Related papers (2024-05-22T02:11:18Z) - DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception [78.26734070960886]
Current perceptive models heavily depend on resource-intensive datasets.
We introduce perception-aware loss (P.A. loss) through segmentation, improving both quality and controllability.
Our method customizes data augmentation by extracting and utilizing perception-aware attribute (P.A. Attr) during generation.
arXiv Detail & Related papers (2024-03-20T04:58:03Z) - Data Augmentation in Human-Centric Vision [54.97327269866757]
This survey presents a comprehensive analysis of data augmentation techniques in human-centric vision tasks.
It delves into a wide range of research areas including person ReID, human parsing, human pose estimation, and pedestrian detection.
Our work categorizes data augmentation methods into two main types: data generation and data perturbation.
arXiv Detail & Related papers (2024-03-13T16:05:18Z) - Data-efficient Large Vision Models through Sequential Autoregression [58.26179273091461]
We develop an efficient, autoregression-based vision model on a limited dataset.
We demonstrate how this model achieves proficiency in a spectrum of visual tasks spanning both high-level and low-level semantic understanding.
Our empirical evaluations underscore the model's agility in adapting to various tasks, heralding a significant reduction in the parameter footprint.
arXiv Detail & Related papers (2024-02-07T13:41:53Z) - Fantastic Gains and Where to Find Them: On the Existence and Prospect of
General Knowledge Transfer between Any Pretrained Model [74.62272538148245]
We show that for arbitrary pairings of pretrained models, one model extracts significant data context unavailable in the other.
We investigate if it is possible to transfer such "complementary" knowledge from one model to another without performance degradation.
arXiv Detail & Related papers (2023-10-26T17:59:46Z) - From Identifiable Causal Representations to Controllable Counterfactual Generation: A Survey on Causal Generative Modeling [17.074858228123706]
We focus on fundamental theory, methodology, drawbacks, datasets, and metrics.
We cover applications of causal generative models in fairness, privacy, out-of-distribution generalization, precision medicine, and biological sciences.
arXiv Detail & Related papers (2023-10-17T05:45:32Z) - On the Efficacy of Adversarial Data Collection for Question Answering:
Results from a Large-Scale Randomized Study [65.17429512679695]
In adversarial data collection (ADC), a human workforce interacts with a model in real time, attempting to produce examples that elicit incorrect predictions.
Despite ADC's intuitive appeal, it remains unclear when training on adversarial datasets produces more robust models.
arXiv Detail & Related papers (2021-06-02T00:48:33Z) - Differentially Private Synthetic Medical Data Generation using
Convolutional GANs [7.2372051099165065]
We develop a differentially private framework for synthetic data generation using R'enyi differential privacy.
Our approach builds on convolutional autoencoders and convolutional generative adversarial networks to preserve some of the critical characteristics of the generated synthetic data.
We demonstrate that our model outperforms existing state-of-the-art models under the same privacy budget.
arXiv Detail & Related papers (2020-12-22T01:03:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.