Generating High Fidelity Synthetic Data via Coreset selection and
Entropic Regularization
- URL: http://arxiv.org/abs/2302.00138v1
- Date: Tue, 31 Jan 2023 22:59:41 GMT
- Title: Generating High Fidelity Synthetic Data via Coreset selection and
Entropic Regularization
- Authors: Omead Pooladzandi, Pasha Khosravi, Erik Nijkamp, Baharan Mirzasoleiman
- Abstract summary: We propose using a combination of coresets selection methods and entropic regularization'' to select the highest fidelity samples.
In a semi-supervised learning scenario, we show that augmenting the labeled data-set, by adding our selected subset of samples, leads to better accuracy improvement.
- Score: 15.866662428675054
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative models have the ability to synthesize data points drawn from the
data distribution, however, not all generated samples are high quality. In this
paper, we propose using a combination of coresets selection methods and
``entropic regularization'' to select the highest fidelity samples. We leverage
an Energy-Based Model which resembles a variational auto-encoder with an
inference and generator model for which the latent prior is complexified by an
energy-based model. In a semi-supervised learning scenario, we show that
augmenting the labeled data-set, by adding our selected subset of samples,
leads to better accuracy improvement rather than using all the synthetic
samples.
Related papers
- Exploring Beyond Logits: Hierarchical Dynamic Labeling Based on Embeddings for Semi-Supervised Classification [49.09505771145326]
We propose a Hierarchical Dynamic Labeling (HDL) algorithm that does not depend on model predictions and utilizes image embeddings to generate sample labels.
Our approach has the potential to change the paradigm of pseudo-label generation in semi-supervised learning.
arXiv Detail & Related papers (2024-04-26T06:00:27Z) - Priority Sampling of Large Language Models for Compilers [4.2266182821287135]
Priority Sampling is a simple and deterministic sampling technique that produces unique samples ordered by the model's confidence.
It supports generation based on regular expression that provides a controllable and structured exploration process.
It outperforms the autotuner used for the generation of labels for the training of the original model in just 30 samples.
arXiv Detail & Related papers (2024-02-28T22:27:49Z) - Iterated Denoising Energy Matching for Sampling from Boltzmann Densities [109.23137009609519]
Iterated Denoising Energy Matching (iDEM)
iDEM alternates between (I) sampling regions of high model density from a diffusion-based sampler and (II) using these samples in our matching objective.
We show that the proposed approach achieves state-of-the-art performance on all metrics and trains $2-5times$ faster.
arXiv Detail & Related papers (2024-02-09T01:11:23Z) - Quality-Diversity Generative Sampling for Learning with Synthetic Data [18.642540152362237]
Generative models can serve as surrogates for some real data sources by creating synthetic training datasets.
We propose quality-diversity generative sampling (QDGS), a framework for sampling data uniformly across a user-defined measure space.
arXiv Detail & Related papers (2023-12-22T01:43:27Z) - Revisiting the Evaluation of Image Synthesis with GANs [55.72247435112475]
This study presents an empirical investigation into the evaluation of synthesis performance, with generative adversarial networks (GANs) as a representative of generative models.
In particular, we make in-depth analyses of various factors, including how to represent a data point in the representation space, how to calculate a fair distance using selected samples, and how many instances to use from each set.
arXiv Detail & Related papers (2023-04-04T17:54:32Z) - Energy-Based Test Sample Adaptation for Domain Generalization [81.04943285281072]
We propose energy-based sample adaptation at test time for domain.
To adapt target samples to source distributions, we iteratively update the samples by energy minimization.
Experiments on six benchmarks for classification of images and microblog threads demonstrate the effectiveness of our proposal.
arXiv Detail & Related papers (2023-02-22T08:55:09Z) - Selectively increasing the diversity of GAN-generated samples [8.980453507536017]
We propose a novel method to selectively increase the diversity of GAN-generated samples.
We show the superiority of our method in a synthetic benchmark as well as a real-life scenario simulating data from the Zero Degree Calorimeter of ALICE experiment in CERN.
arXiv Detail & Related papers (2022-07-04T16:27:06Z) - A Kernelised Stein Statistic for Assessing Implicit Generative Models [10.616967871198689]
We propose a principled procedure to assess the quality of a synthetic data generator.
The sample size from the synthetic data generator can be as large as desired, while the size of the observed data, which the generator aims to emulate is fixed.
arXiv Detail & Related papers (2022-05-31T23:40:21Z) - Oops I Took A Gradient: Scalable Sampling for Discrete Distributions [53.3142984019796]
We show that this approach outperforms generic samplers in a number of difficult settings.
We also demonstrate the use of our improved sampler for training deep energy-based models on high dimensional discrete data.
arXiv Detail & Related papers (2021-02-08T20:08:50Z) - One for More: Selecting Generalizable Samples for Generalizable ReID
Model [92.40951770273972]
This paper proposes a one-for-more training objective that takes the generalization ability of selected samples as a loss function.
Our proposed one-for-more based sampler can be seamlessly integrated into the ReID training framework.
arXiv Detail & Related papers (2020-12-10T06:37:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.