ExGAN: Adversarial Generation of Extreme Samples
- URL: http://arxiv.org/abs/2009.08454v3
- Date: Mon, 15 Mar 2021 15:49:39 GMT
- Title: ExGAN: Adversarial Generation of Extreme Samples
- Authors: Siddharth Bhatia, Arjit Jain, Bryan Hooi
- Abstract summary: Mitigating the risk arising from extreme events is a fundamental goal with many applications.
Existing approaches based on Generative Adversarial Networks (GANs) excel at generating realistic samples.
We propose ExGAN, a GAN-based approach to generate realistic and extreme samples.
- Score: 33.70161373245072
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mitigating the risk arising from extreme events is a fundamental goal with
many applications, such as the modelling of natural disasters, financial
crashes, epidemics, and many others. To manage this risk, a vital step is to be
able to understand or generate a wide range of extreme scenarios. Existing
approaches based on Generative Adversarial Networks (GANs) excel at generating
realistic samples, but seek to generate typical samples, rather than extreme
samples. Hence, in this work, we propose ExGAN, a GAN-based approach to
generate realistic and extreme samples. To model the extremes of the training
distribution in a principled way, our work draws from Extreme Value Theory
(EVT), a probabilistic approach for modelling the extreme tails of
distributions. For practical utility, our framework allows the user to specify
both the desired extremeness measure, as well as the desired extremeness
probability they wish to sample at. Experiments on real US Precipitation data
show that our method generates realistic samples, based on visual inspection
and quantitative measures, in an efficient manner. Moreover, generating
increasingly extreme examples using ExGAN can be done in constant time (with
respect to the extremeness probability $\tau$), as opposed to the
$\mathcal{O}(\frac{1}{\tau})$ time required by the baseline approach.
Related papers
- Statistical Estimation of Adversarial Risk in Large Language Models under Best-of-N Sampling [50.872910438715486]
Large Language Models (LLMs) are typically evaluated for safety under single-shot or low-budget adversarial prompting.<n>We propose a scaling-aware Best-of-N estimation of risk, SABER, for modeling jailbreak vulnerability under Best-of-N sampling.
arXiv Detail & Related papers (2026-01-30T06:54:35Z) - Beyond Linear Diffusions: Improved Representations for Rare Conditional Generative Modeling [4.527435625329663]
We show it is possible to adapt the data representation and forward scheme so that the sample complexity of learning a score-based generative model is small in low probability regions of the conditioning space.<n>We show how diffusion with a data-driven choice of nonlinear drift term is best suited to model tail events under an appropriate representation of the data.
arXiv Detail & Related papers (2025-10-02T19:06:14Z) - Controllable Generation via Locally Constrained Resampling [77.48624621592523]
We propose a tractable probabilistic approach that performs Bayesian conditioning to draw samples subject to a constraint.
Our approach considers the entire sequence, leading to a more globally optimal constrained generation than current greedy methods.
We show that our approach is able to steer the model's outputs away from toxic generations, outperforming similar approaches to detoxification.
arXiv Detail & Related papers (2024-10-17T00:49:53Z) - Which Pretrain Samples to Rehearse when Finetuning Pretrained Models? [60.59376487151964]
Fine-tuning pretrained models on specific tasks is now the de facto approach for text and vision tasks.
A known pitfall of this approach is the forgetting of pretraining knowledge that happens during finetuning.
We propose a novel sampling scheme, mix-cd, that identifies and prioritizes samples that actually face forgetting.
arXiv Detail & Related papers (2024-02-12T22:32:12Z) - Dropout-Based Rashomon Set Exploration for Efficient Predictive
Multiplicity Estimation [15.556756363296543]
Predictive multiplicity refers to the phenomenon in which classification tasks admit multiple competing models that achieve almost-equally-optimal performance.
We propose a novel framework that utilizes dropout techniques for exploring models in the Rashomon set.
We show that our technique consistently outperforms baselines in terms of the effectiveness of predictive multiplicity metric estimation.
arXiv Detail & Related papers (2024-02-01T16:25:00Z) - A VAE Approach to Sample Multivariate Extremes [6.548734807475054]
This paper describes a variational autoencoder (VAE) approach for sampling heavy-tailed distributions likely to have extremes of particularly large intensities.
We illustrate the relevance of our approach on a synthetic data set and on a real data set of discharge measurements along the Danube river network.
In addition to outperforming the standard VAE for the tested data sets, we also provide a comparison with a competing EVT-based generative approach.
arXiv Detail & Related papers (2023-06-19T14:53:40Z) - User-defined Event Sampling and Uncertainty Quantification in Diffusion
Models for Physical Dynamical Systems [49.75149094527068]
We show that diffusion models can be adapted to make predictions and provide uncertainty quantification for chaotic dynamical systems.
We develop a probabilistic approximation scheme for the conditional score function which converges to the true distribution as the noise level decreases.
We are able to sample conditionally on nonlinear userdefined events at inference time, and matches data statistics even when sampling from the tails of the distribution.
arXiv Detail & Related papers (2023-06-13T03:42:03Z) - Latent Imitator: Generating Natural Individual Discriminatory Instances
for Black-Box Fairness Testing [45.183849487268496]
This paper proposes a framework named Latent Imitator (LIMI) to generate more natural individual discriminatory instances.
We first derive a surrogate linear boundary to approximate the decision boundary of the target model.
We then manipulate random latent vectors to the surrogate boundary with a one-step movement, and further conduct vector calculation to probe two potential discriminatory candidates.
arXiv Detail & Related papers (2023-05-19T11:29:13Z) - A Flow-Based Generative Model for Rare-Event Simulation [0.483420384410068]
We present a method in which a Normalizing Flow generative model is trained to simulate samples directly from a conditional distribution.
We illustrate that by simulating directly from a rare-event distribution significant insight can be gained into the way rare events happen.
arXiv Detail & Related papers (2023-05-13T08:25:57Z) - Learning from a Biased Sample [3.546358664345473]
We propose a method for learning a decision rule that minimizes the worst-case risk incurred under a family of test distributions.
We empirically validate our proposed method in a case study on prediction of mental health scores from health survey data.
arXiv Detail & Related papers (2022-09-05T04:19:16Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - Sampling from Arbitrary Functions via PSD Models [55.41644538483948]
We take a two-step approach by first modeling the probability distribution and then sampling from that model.
We show that these models can approximate a large class of densities concisely using few evaluations, and present a simple algorithm to effectively sample from these models.
arXiv Detail & Related papers (2021-10-20T12:25:22Z) - Learning Energy-Based Models by Diffusion Recovery Likelihood [61.069760183331745]
We present a diffusion recovery likelihood method to tractably learn and sample from a sequence of energy-based models.
After training, synthesized images can be generated by the sampling process that initializes from Gaussian white noise distribution.
On unconditional CIFAR-10 our method achieves FID 9.58 and inception score 8.30, superior to the majority of GANs.
arXiv Detail & Related papers (2020-12-15T07:09:02Z) - Improving Maximum Likelihood Training for Text Generation with Density
Ratio Estimation [51.091890311312085]
We propose a new training scheme for auto-regressive sequence generative models, which is effective and stable when operating at large sample space encountered in text generation.
Our method stably outperforms Maximum Likelihood Estimation and other state-of-the-art sequence generative models in terms of both quality and diversity.
arXiv Detail & Related papers (2020-07-12T15:31:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.