Related papers: Probabilistic Matching of Real and Generated Data Statistics in Generative Adversarial Networks

Probabilistic Matching of Real and Generated Data Statistics in Generative Adversarial Networks

URL: http://arxiv.org/abs/2306.10943v2
Date: Thu, 8 Feb 2024 21:17:12 GMT
Title: Probabilistic Matching of Real and Generated Data Statistics in Generative Adversarial Networks
Authors: Philipp Pilar, Niklas Wahlstr\"om
Abstract summary: We propose a method to ensure that the distributions of certain generated data statistics coincide with the respective distributions of the real data. We evaluate the method on a synthetic dataset and a real-world dataset and demonstrate improved performance of our approach.
Score: 0.10878040851637999
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generative adversarial networks constitute a powerful approach to generative modeling. While generated samples often are indistinguishable from real data, mode-collapse may occur and there is no guarantee that they will follow the true data distribution. For scientific applications in particular, it is essential that the true distribution is well captured by the generated distribution. In this work, we propose a method to ensure that the distributions of certain generated data statistics coincide with the respective distributions of the real data. In order to achieve this, we add a new loss term to the generator loss function, which quantifies the difference between these distributions via suitable f-divergences. Kernel density estimation is employed to obtain representations of the true distributions, and to estimate the corresponding generated distributions from minibatch values at each iteration. When compared to other methods, our approach has the advantage that the complete shapes of the distributions are taken into account. We evaluate the method on a synthetic dataset and a real-world dataset and demonstrate improved performance of our approach.

Related papers

Designing a Conditional Prior Distribution for Flow-Based Generative Models [16.729797131896138]
Flow-basedgenerative models have recently shown impressive performance for conditional generation tasks. In this work, we tap into a non-utilized property of conditional flow-based models: the ability to design a non-trivial prior distribution. We utilize the flow matching formulation to map samples from a parametric distribution centered around this point to the conditional target distribution.
arXiv Detail & Related papers (2025-02-13T18:58:15Z)
Theory on Score-Mismatched Diffusion Models and Zero-Shot Conditional Samplers [49.97755400231656]
We present the first performance guarantee with explicit dimensional general score-mismatched diffusion samplers. We show that score mismatches result in an distributional bias between the target and sampling distributions, proportional to the accumulated mismatch between the target and training distributions. This result can be directly applied to zero-shot conditional samplers for any conditional model, irrespective of measurement noise.
arXiv Detail & Related papers (2024-10-17T16:42:12Z)
Generative Assignment Flows for Representing and Learning Joint Distributions of Discrete Data [2.6499018693213316]
We introduce a novel generative model for the representation of joint probability distributions of a possibly large number of discrete random variables. The embedding of the flow via the Segre map in the meta-simplex of all discrete joint distributions ensures that any target distribution can be represented in principle. Our approach has strong motivation from first principles of modeling coupled discrete variables.
arXiv Detail & Related papers (2024-06-06T21:58:33Z)
Deep Generative Sampling in the Dual Divergence Space: A Data-efficient & Interpretative Approach for Generative AI [29.13807697733638]
We build on the remarkable achievements in generative sampling of natural images. We propose an innovative challenge, potentially overly ambitious, which involves generating samples that resemble images. The statistical challenge lies in the small sample size, sometimes consisting of a few hundred subjects.
arXiv Detail & Related papers (2024-04-10T22:35:06Z)
Distributed Markov Chain Monte Carlo Sampling based on the Alternating Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers. We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art. In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z)
Score Approximation, Estimation and Distribution Recovery of Diffusion Models on Low-Dimensional Data [68.62134204367668]
This paper studies score approximation, estimation, and distribution recovery of diffusion models, when data are supported on an unknown low-dimensional linear subspace. We show that with a properly chosen neural network architecture, the score function can be both accurately approximated and efficiently estimated. The generated distribution based on the estimated score function captures the data geometric structures and converges to a close vicinity of the data distribution.
arXiv Detail & Related papers (2023-02-14T17:02:35Z)
Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions. In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data. We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z)
Investigating Shifts in GAN Output-Distributions [5.076419064097734]
We introduce a loop-training scheme for the systematic investigation of observable shifts between the distributions of real training data and GAN generated data. Overall, the combination of these methods allows an explorative investigation of innate limitations of current GAN algorithms.
arXiv Detail & Related papers (2021-12-28T09:16:55Z)
Predicting with Confidence on Unseen Distributions [90.68414180153897]
We connect domain adaptation and predictive uncertainty literature to predict model accuracy on challenging unseen distributions. We find that the difference of confidences (DoC) of a classifier's predictions successfully estimates the classifier's performance change over a variety of shifts. We specifically investigate the distinction between synthetic and natural distribution shifts and observe that despite its simplicity DoC consistently outperforms other quantifications of distributional difference.
arXiv Detail & Related papers (2021-07-07T15:50:18Z)
GENs: Generative Encoding Networks [4.269725092203672]
We propose and analyze the use of nonparametric density methods to estimate the Jensen-Shannon divergence for matching unknown data distributions to known target distributions. This analytical method has several advantages: better behavior when training sample quantity is low, provable convergence properties, and relatively few parameters, which can be derived analytically.
arXiv Detail & Related papers (2020-10-28T23:40:03Z)
Distribution Approximation and Statistical Estimation Guarantees of Generative Adversarial Networks [82.61546580149427]
Generative Adversarial Networks (GANs) have achieved a great success in unsupervised learning. This paper provides approximation and statistical guarantees of GANs for the estimation of data distributions with densities in a H"older space.
arXiv Detail & Related papers (2020-02-10T16:47:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.