Related papers: $\alpha$-GAN: Convergence and Estimation Guarantees

Related papers

Rethinking the Global Convergence of Softmax Policy Gradient with Linear Function Approximation [52.772454746132276]
We show that the approximation error in modeling problem-dependent quantities is irrelevant to the algorithm's global convergence.<n>We prove that $textttLin-SPG$ with any arbitrary constant learning rate can ensure global convergence to the optimal policy.
arXiv Detail & Related papers (2025-05-06T04:03:06Z)
Enhancing Convergence of Decentralized Gradient Tracking under the KL Property [10.925931212031692]
We establish convergence of the same type for the notorious decentralized gradient-tracking-based algorithm SONATA.<n>This matches the convergence behavior of centralized-gradient algorithms except when.<n>thetain (1/2,1)$, sub rate is certified; and.<n>textbf(iii)$ when $thetain (1/2,1)$, sub rate is certified; and.<n>textbf(iii)$ when $thetain (1/2,1)$, sub rate is certified.
arXiv Detail & Related papers (2024-12-12T18:44:36Z)
Two-Timescale Gradient Descent Ascent Algorithms for Nonconvex Minimax Optimization [77.3396841985172]
We provide a unified analysis of two-timescale gradient ascent (TTGDA) for solving structured non minimax optimization problems.<n>Our contribution is to design TTGDA algorithms are effective beyond the setting.
arXiv Detail & Related papers (2024-08-21T20:14:54Z)
Statistical Error Bounds for GANs with Nonlinear Objective Functionals [5.022028859839544]
We derive statistical error bounds for $(f,Gamma)$-GANs for general classes of $f$ and $Gamma$ in the form of finite-sample concentration inequalities. Results prove the statistical consistency of $(f,Gamma)$-GANs and reduce to the known results for IPM-GANs in the appropriate limit.
arXiv Detail & Related papers (2024-06-24T17:42:03Z)
Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit [75.4661041626338]
We study the problem of gradient descent learning of a single-index target function $f_*(boldsymbolx) = textstylesigma_*left(langleboldsymbolx,boldsymbolthetarangleright)$ We prove that a two-layer neural network optimized by an SGD-based algorithm learns $f_*$ with a complexity that is not governed by information exponents.
arXiv Detail & Related papers (2024-06-03T17:56:58Z)
MGDA Converges under Generalized Smoothness, Provably [27.87166415148172]
Multi-objective optimization (MOO) is receiving more attention in various fields such as multi-task learning. Recent works provide some effective algorithms with theoretical analysis but they are limited by the standard $L$-smooth or bounded-gradient assumptions. We study a more general and realistic class of generalized $ell$-smooth loss functions, where $ell$ is a general non-decreasing function of gradient norm.
arXiv Detail & Related papers (2024-05-29T18:36:59Z)
Addressing GAN Training Instabilities via Tunable Classification Losses [8.151943266391493]
Generative adversarial networks (GANs) allow generating synthetic data with formal guarantees. We show that all symmetric $f$-divergences are equivalent in convergence. We also highlight the value of tuning $(alpha_D,alpha_G)$ in alleviating training instabilities for the synthetic 2D Gaussian mixture ring.
arXiv Detail & Related papers (2023-10-27T17:29:07Z)
A Unified Framework for Uniform Signal Recovery in Nonlinear Generative Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously. Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples. We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z)
$(\alpha_D,\alpha_G)$-GANs: Addressing GAN Training Instabilities via Dual Objectives [7.493779672689531]
We introduce a class of dual-objective GANs with different value functions (objectives) for the generator (G) and discriminator (D) We show that the resulting non-zero sum game simplifies to minimize an $f$-divergence under appropriate conditions on $(alpha_D,alpha_G)$. We highlight the value of tuning $(alpha_D,alpha_G)$ in alleviating training instabilities for the synthetic 2D Gaussian mixture ring and the Stacked MNIST datasets.
arXiv Detail & Related papers (2023-02-28T05:22:54Z)
Horizon-Free and Variance-Dependent Reinforcement Learning for Latent Markov Decision Processes [62.90204655228324]
We study regret minimization for reinforcement learning (RL) in Latent Markov Decision Processes (LMDPs) with context in hindsight. We design a novel model-based algorithmic framework which can be instantiated with both a model-optimistic and a value-optimistic solver.
arXiv Detail & Related papers (2022-10-20T21:32:01Z)
A Law of Robustness beyond Isoperimetry [84.33752026418045]
We prove a Lipschitzness lower bound $Omega(sqrtn/p)$ of robustness of interpolating neural network parameters on arbitrary distributions. We then show the potential benefit of overparametrization for smooth data when $n=mathrmpoly(d)$. We disprove the potential existence of an $O(1)$-Lipschitz robust interpolating function when $n=exp(omega(d))$.
arXiv Detail & Related papers (2022-02-23T16:10:23Z)
Realizing GANs via a Tunable Loss Function [7.455546102930911]
We introduce a tunable GAN, called $alpha$-GAN, parameterized by $alpha in (0,infty]$. We show that $alpha$-GAN is intimately related to the Arimoto divergence.
arXiv Detail & Related papers (2021-06-09T17:18:21Z)
Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$ We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z)
Least $k$th-Order and R\'{e}nyi Generative Adversarial Networks [12.13405065406781]
Experimental results indicate that the proposed loss functions, applied to the MNIST and CelebA datasets, confer performance benefits by virtue of the extra degrees of freedom provided by the parameters $k$ and $alpha$, respectively. While it was applied to GANs in this study, the proposed approach is generic and can be used in other applications of information theory to deep learning, e.g., the issues of fairness or privacy in artificial intelligence.
arXiv Detail & Related papers (2020-06-03T18:44:05Z)
Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss. For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$. For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z)
A Simple Convergence Proof of Adam and Adagrad [74.24716715922759]
We show a proof of convergence between the Adam Adagrad and $O(d(N)/st)$ algorithms. Adam converges with the same convergence $O(d(N)/st)$ when used with the default parameters.
arXiv Detail & Related papers (2020-03-05T01:56:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.