Evolving GANs: When Contradictions Turn into Compliance
- URL: http://arxiv.org/abs/2106.09946v1
- Date: Fri, 18 Jun 2021 06:51:35 GMT
- Title: Evolving GANs: When Contradictions Turn into Compliance
- Authors: Sauptik Dhar, Javad Heydari, Samarth Tripathi, Unmesh Kurup, Mohak
Shah
- Abstract summary: We propose a GAN game which provides improved discriminator accuracy under limited data settings, while generating realistic synthetic data.
This provides the added advantage that now the generated data can be used for other similar tasks.
- Score: 11.353579556329962
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Limited availability of labeled-data makes any supervised learning problem
challenging. Alternative learning settings like semi-supervised and universum
learning alleviate the dependency on labeled data, but still require a large
amount of unlabeled data, which may be unavailable or expensive to acquire.
GAN-based synthetic data generation methods have recently shown promise by
generating synthetic samples to improve task at hand. However, these samples
cannot be used for other purposes. In this paper, we propose a GAN game which
provides improved discriminator accuracy under limited data settings, while
generating realistic synthetic data. This provides the added advantage that now
the generated data can be used for other similar tasks. We provide the
theoretical guarantees and empirical results in support of our approach.
Related papers
- Generating Realistic Tabular Data with Large Language Models [49.03536886067729]
Large language models (LLM) have been used for diverse tasks, but do not capture the correct correlation between the features and the target variable.
We propose a LLM-based method with three important improvements to correctly capture the ground-truth feature-class correlation in the real data.
Our experiments show that our method significantly outperforms 10 SOTA baselines on 20 datasets in downstream tasks.
arXiv Detail & Related papers (2024-10-29T04:14:32Z) - Improving Grammatical Error Correction via Contextual Data Augmentation [49.746484518527716]
We propose a synthetic data construction method based on contextual augmentation.
Specifically, we combine rule-based substitution with model-based generation.
We also propose a relabeling-based data cleaning method to mitigate the effects of noisy labels in synthetic data.
arXiv Detail & Related papers (2024-06-25T10:49:56Z) - Reimagining Synthetic Tabular Data Generation through Data-Centric AI: A
Comprehensive Benchmark [56.8042116967334]
Synthetic data serves as an alternative in training machine learning models.
ensuring that synthetic data mirrors the complex nuances of real-world data is a challenging task.
This paper explores the potential of integrating data-centric AI techniques to guide the synthetic data generation process.
arXiv Detail & Related papers (2023-10-25T20:32:02Z) - On the Usefulness of Synthetic Tabular Data Generation [3.04585143845864]
It is commonly believed that synthetic data can be used for both data exchange and boosting machine learning (ML) training.
Privacy-preserving synthetic data generation can accelerate data exchange for downstream tasks, but there is not enough evidence to show how or why synthetic data can boost ML training.
arXiv Detail & Related papers (2023-06-27T17:26:23Z) - Generating Faithful Synthetic Data with Large Language Models: A Case
Study in Computational Social Science [13.854807858791652]
We tackle a pervasive problem in synthetic data generation: its generative distribution often differs from the distribution of real-world data researchers care about.
We study three strategies to increase the faithfulness of synthetic data: grounding, filtering, and taxonomy-based generation.
We conclude this paper with some recommendations on how to generate high(er)-fidelity synthetic data for specific tasks.
arXiv Detail & Related papers (2023-05-24T11:27:59Z) - Synthetic data, real errors: how (not) to publish and use synthetic data [86.65594304109567]
We show how the generative process affects the downstream ML task.
We introduce Deep Generative Ensemble (DGE) to approximate the posterior distribution over the generative process model parameters.
arXiv Detail & Related papers (2023-05-16T07:30:29Z) - FairGen: Fair Synthetic Data Generation [0.3149883354098941]
We propose a pipeline to generate fairer synthetic data independent of the GAN architecture.
We claim that while generating synthetic data most GANs amplify bias present in the training data but by removing these bias inducing samples, GANs essentially focuses more on real informative samples.
arXiv Detail & Related papers (2022-10-24T08:13:47Z) - Negative Data Augmentation [127.28042046152954]
We show that negative data augmentation samples provide information on the support of the data distribution.
We introduce a new GAN training objective where we use NDA as an additional source of synthetic data for the discriminator.
Empirically, models trained with our method achieve improved conditional/unconditional image generation along with improved anomaly detection capabilities.
arXiv Detail & Related papers (2021-02-09T20:28:35Z) - Foundations of Bayesian Learning from Synthetic Data [1.6249267147413522]
We use a Bayesian paradigm to characterise the updating of model parameters when learning on synthetic data.
Recent results from general Bayesian updating support a novel and robust approach to synthetic-learning founded on decision theory.
arXiv Detail & Related papers (2020-11-16T21:49:17Z) - Learning to Count in the Crowd from Limited Labeled Data [109.2954525909007]
We focus on reducing the annotation efforts by learning to count in the crowd from limited number of labeled samples.
Specifically, we propose a Gaussian Process-based iterative learning mechanism that involves estimation of pseudo-ground truth for the unlabeled data.
arXiv Detail & Related papers (2020-07-07T04:17:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.