Probabilistic Matching of Real and Generated Data Statistics in
Generative Adversarial Networks
- URL: http://arxiv.org/abs/2306.10943v2
- Date: Thu, 8 Feb 2024 21:17:12 GMT
- Title: Probabilistic Matching of Real and Generated Data Statistics in
Generative Adversarial Networks
- Authors: Philipp Pilar, Niklas Wahlstr\"om
- Abstract summary: We propose a method to ensure that the distributions of certain generated data statistics coincide with the respective distributions of the real data.
We evaluate the method on a synthetic dataset and a real-world dataset and demonstrate improved performance of our approach.
- Score: 0.10878040851637999
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative adversarial networks constitute a powerful approach to generative
modeling. While generated samples often are indistinguishable from real data,
mode-collapse may occur and there is no guarantee that they will follow the
true data distribution. For scientific applications in particular, it is
essential that the true distribution is well captured by the generated
distribution. In this work, we propose a method to ensure that the
distributions of certain generated data statistics coincide with the respective
distributions of the real data. In order to achieve this, we add a new loss
term to the generator loss function, which quantifies the difference between
these distributions via suitable f-divergences. Kernel density estimation is
employed to obtain representations of the true distributions, and to estimate
the corresponding generated distributions from minibatch values at each
iteration. When compared to other methods, our approach has the advantage that
the complete shapes of the distributions are taken into account. We evaluate
the method on a synthetic dataset and a real-world dataset and demonstrate
improved performance of our approach.
Related papers
- Generative Assignment Flows for Representing and Learning Joint Distributions of Discrete Data [2.6499018693213316]
We introduce a novel generative model for the representation of joint probability distributions of a possibly large number of discrete random variables.
The embedding of the flow via the Segre map in the meta-simplex of all discrete joint distributions ensures that any target distribution can be represented in principle.
Our approach has strong motivation from first principles of modeling coupled discrete variables.
arXiv Detail & Related papers (2024-06-06T21:58:33Z) - Deep Generative Sampling in the Dual Divergence Space: A Data-efficient & Interpretative Approach for Generative AI [29.13807697733638]
We build on the remarkable achievements in generative sampling of natural images.
We propose an innovative challenge, potentially overly ambitious, which involves generating samples that resemble images.
The statistical challenge lies in the small sample size, sometimes consisting of a few hundred subjects.
arXiv Detail & Related papers (2024-04-10T22:35:06Z) - Uncertainty Quantification via Stable Distribution Propagation [60.065272548502]
We propose a new approach for propagating stable probability distributions through neural networks.
Our method is based on local linearization, which we show to be an optimal approximation in terms of total variation distance for the ReLU non-linearity.
arXiv Detail & Related papers (2024-02-13T09:40:19Z) - Distributed Markov Chain Monte Carlo Sampling based on the Alternating
Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers.
We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art.
In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z) - R-divergence for Estimating Model-oriented Distribution Discrepancy [37.939239477868796]
We introduce R-divergence, designed to assess model-oriented distribution discrepancies.
R-divergence learns a minimum hypothesis on the mixed data and then gauges the empirical risk difference between them.
We evaluate the test power across various unsupervised and supervised tasks and find that R-divergence achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-10-02T11:30:49Z) - Score Approximation, Estimation and Distribution Recovery of Diffusion
Models on Low-Dimensional Data [68.62134204367668]
This paper studies score approximation, estimation, and distribution recovery of diffusion models, when data are supported on an unknown low-dimensional linear subspace.
We show that with a properly chosen neural network architecture, the score function can be both accurately approximated and efficiently estimated.
The generated distribution based on the estimated score function captures the data geometric structures and converges to a close vicinity of the data distribution.
arXiv Detail & Related papers (2023-02-14T17:02:35Z) - Unsupervised Learning of Sampling Distributions for Particle Filters [80.6716888175925]
We put forward four methods for learning sampling distributions from observed measurements.
Experiments demonstrate that learned sampling distributions exhibit better performance than designed, minimum-degeneracy sampling distributions.
arXiv Detail & Related papers (2023-02-02T15:50:21Z) - Data-Driven Approximations of Chance Constrained Programs in
Nonstationary Environments [3.126118485851773]
We study sample average approximations (SAA) of chance constrained programs.
We consider a nonstationary variant of this problem, where the random samples are assumed to be independently drawn in a sequential fashion.
We propose a novel robust SAA method exploiting information about the Wasserstein distance between the sequence of data-generating distributions and the actual chance constraint distribution.
arXiv Detail & Related papers (2022-05-08T01:01:57Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - GENs: Generative Encoding Networks [4.269725092203672]
We propose and analyze the use of nonparametric density methods to estimate the Jensen-Shannon divergence for matching unknown data distributions to known target distributions.
This analytical method has several advantages: better behavior when training sample quantity is low, provable convergence properties, and relatively few parameters, which can be derived analytically.
arXiv Detail & Related papers (2020-10-28T23:40:03Z) - Distribution Approximation and Statistical Estimation Guarantees of
Generative Adversarial Networks [82.61546580149427]
Generative Adversarial Networks (GANs) have achieved a great success in unsupervised learning.
This paper provides approximation and statistical guarantees of GANs for the estimation of data distributions with densities in a H"older space.
arXiv Detail & Related papers (2020-02-10T16:47:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.