Rates of convergence for density estimation with generative adversarial
networks
- URL: http://arxiv.org/abs/2102.00199v4
- Date: Thu, 25 Jan 2024 10:04:05 GMT
- Title: Rates of convergence for density estimation with generative adversarial
networks
- Authors: Nikita Puchkin, Sergey Samsonov, Denis Belomestny, Eric Moulines, and
Alexey Naumov
- Abstract summary: We prove an oracle inequality for the Jensen-Shannon (JS) divergence between the underlying density $mathsfp*$ and the GAN estimate.
We show that the JS-divergence between the GAN estimate and $mathsfp*$ decays as fast as $(logn/n)2beta/ (2beta + d)$.
- Score: 19.71040653379663
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work we undertake a thorough study of the non-asymptotic properties
of the vanilla generative adversarial networks (GANs). We prove an oracle
inequality for the Jensen-Shannon (JS) divergence between the underlying
density $\mathsf{p}^*$ and the GAN estimate with a significantly better
statistical error term compared to the previously known results. The advantage
of our bound becomes clear in application to nonparametric density estimation.
We show that the JS-divergence between the GAN estimate and $\mathsf{p}^*$
decays as fast as $(\log{n}/n)^{2\beta/(2\beta + d)}$, where $n$ is the sample
size and $\beta$ determines the smoothness of $\mathsf{p}^*$. This rate of
convergence coincides (up to logarithmic factors) with minimax optimal for the
considered class of densities.
Related papers
- A Unified Analysis for Finite Weight Averaging [50.75116992029417]
Averaging iterations of Gradient Descent (SGD) have achieved empirical success in training deep learning models, such as Weight Averaging (SWA), Exponential Moving Average (EMA), and LAtest Weight Averaging (LAWA)
In this paper, we generalize LAWA as Finite Weight Averaging (FWA) and explain their advantages compared to SGD from the perspective of optimization and generalization.
arXiv Detail & Related papers (2024-11-20T10:08:22Z) - Convergence Analysis of Probability Flow ODE for Score-based Generative Models [5.939858158928473]
We study the convergence properties of deterministic samplers based on probability flow ODEs from both theoretical and numerical perspectives.
We prove the total variation between the target and the generated data distributions can be bounded above by $mathcalO(d3/4delta1/2)$ in the continuous time level.
arXiv Detail & Related papers (2024-04-15T12:29:28Z) - Convergence Rates for Stochastic Approximation: Biased Noise with Unbounded Variance, and Applications [2.0584253077707477]
We study the convergence properties of the Gradient Descent (SGD) method for finding a stationary point of an objective function $J(cdot)$.
Our results apply to a class of invex'' functions, which have the property that every stationary point is also a global minimizer.
arXiv Detail & Related papers (2023-12-05T15:22:39Z) - Towards Faster Non-Asymptotic Convergence for Diffusion-Based Generative
Models [49.81937966106691]
We develop a suite of non-asymptotic theory towards understanding the data generation process of diffusion models.
In contrast to prior works, our theory is developed based on an elementary yet versatile non-asymptotic approach.
arXiv Detail & Related papers (2023-06-15T16:30:08Z) - Estimating the Density Ratio between Distributions with High Discrepancy
using Multinomial Logistic Regression [21.758330613138778]
We show that the state-of-the-art density ratio estimators perform poorly on well-separated cases.
We present an alternative method that leverages multi-class classification for density ratio estimation.
arXiv Detail & Related papers (2023-05-01T15:10:56Z) - High-Dimensional Inference over Networks: Linear Convergence and
Statistical Guarantees [20.701475313495884]
We study a sparse linear regression over a network of agents, modeled as an undirected graph and no server node.
We analyze the convergence rate and statistical guarantees of a distributed projected gradient tracking-based algorithm.
arXiv Detail & Related papers (2022-01-21T01:26:08Z) - Acceleration in Distributed Optimization Under Similarity [72.54787082152278]
We study distributed (strongly convex) optimization problems over a network of agents, with no centralized nodes.
An $varepsilon$-solution is achieved in $tildemathcalrhoObig(sqrtfracbeta/mu (1-)log1/varepsilonbig)$ number of communications steps.
This rate matches (up to poly-log factors) for the first time lower complexity communication bounds of distributed gossip-algorithms applied to the class of problems of interest.
arXiv Detail & Related papers (2021-10-24T04:03:00Z) - Limit Distribution Theory for the Smooth 1-Wasserstein Distance with
Applications [18.618590805279187]
smooth 1-Wasserstein distance (SWD) $W_1sigma$ was recently proposed as a means to mitigate the curse of dimensionality in empirical approximation.
This work conducts a thorough statistical study of the SWD, including a high-dimensional limit distribution result.
arXiv Detail & Related papers (2021-07-28T17:02:24Z) - The Sample Complexity of Robust Covariance Testing [56.98280399449707]
We are given i.i.d. samples from a distribution of the form $Z = (1-epsilon) X + epsilon B$, where $X$ is a zero-mean and unknown covariance Gaussian $mathcalN(0, Sigma)$.
In the absence of contamination, prior work gave a simple tester for this hypothesis testing task that uses $O(d)$ samples.
We prove a sample complexity lower bound of $Omega(d2)$ for $epsilon$ an arbitrarily small constant and $gamma
arXiv Detail & Related papers (2020-12-31T18:24:41Z) - Analysis of KNN Density Estimation [56.29748742084386]
kNN density estimation is minimax optimal under both $ell_infty$ and $ell_infty$ criteria, if the support set is known.
The $ell_infty$ error does not reach the minimax lower bound, but is better than kernel density estimation.
arXiv Detail & Related papers (2020-09-30T03:33:17Z) - Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and
Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP)
We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.