Mean-Field Analysis for Learning Subspace-Sparse Polynomials with Gaussian Input
- URL: http://arxiv.org/abs/2402.08948v3
- Date: Thu, 09 Jan 2025 05:24:57 GMT
- Title: Mean-Field Analysis for Learning Subspace-Sparse Polynomials with Gaussian Input
- Authors: Ziang Chen, Rong Ge,
- Abstract summary: We study the mean-field flow for learning subspace-sparses using gradient descent and two-layer neural networks.<n>We establish a necessary condition for SGD-learnability, involving both the characteristics of the target function and the expressiveness of the activation function.
- Score: 10.594689377150356
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we study the mean-field flow for learning subspace-sparse polynomials using stochastic gradient descent and two-layer neural networks, where the input distribution is standard Gaussian and the output only depends on the projection of the input onto a low-dimensional subspace. We establish a necessary condition for SGD-learnability, involving both the characteristics of the target function and the expressiveness of the activation function. In addition, we prove that the condition is almost sufficient, in the sense that a condition slightly stronger than the necessary condition can guarantee the exponential decay of the loss functional to zero.
Related papers
- Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z) - High Probability Convergence of Adam Under Unbounded Gradients and
Affine Variance Noise [4.9495085874952895]
We show that Adam could converge to the stationary point in high probability with a rate of $mathcalOleft(rm poly(log T)/sqrtTright)$ under coordinate-wise "affine" noise variance.
It is also revealed that Adam's confines within an order of $mathcalOleft(rm poly(left T)right)$ are adaptive to the noise level.
arXiv Detail & Related papers (2023-11-03T15:55:53Z) - Mean-field Analysis of Generalization Errors [1.1344265020822928]
We consider the KL-regularized empirical risk minimization problem and establish generic conditions under which the generalization error convergence rate, when training on a sample of size $n$, is $mathcalO (1/n)$.
In the context of supervised learning with a one-hidden layer neural network in the mean-field regime, these conditions are reflected in suitable integrability and regularity assumptions on the loss and activation functions.
arXiv Detail & Related papers (2023-06-20T15:49:09Z) - On the Identification and Optimization of Nonsmooth Superposition
Operators in Semilinear Elliptic PDEs [3.045851438458641]
We study an infinite-dimensional optimization problem that aims to identify the Nemytskii operator in the nonlinear part of a prototypical semilinear elliptic partial differential equation (PDE)
In contrast to previous works, we consider this identification problem in a low-regularity regime in which the function inducing the Nemytskii operator is a-priori only known to be an element of $H leakyloc(mathbbR)$.
arXiv Detail & Related papers (2023-06-08T13:33:20Z) - Understanding the Generalization Ability of Deep Learning Algorithms: A
Kernelized Renyi's Entropy Perspective [11.255943520955764]
We propose a novel information theoretical measure: kernelized Renyi's entropy.
We establish the generalization error bounds for gradient/Langevin descent (SGD/SGLD) learning algorithms under kernelized Renyi's entropy.
We show that our information-theoretical bounds depend on the statistics of the gradients, and are rigorously tighter than the current state-of-the-art (SOTA) results.
arXiv Detail & Related papers (2023-05-02T01:17:15Z) - Convergence of Adam Under Relaxed Assumptions [72.24779199744954]
We show that Adam converges to $epsilon$-stationary points with $O(epsilon-4)$ gradient complexity under far more realistic conditions.
We also propose a variance-reduced version of Adam with an accelerated gradient complexity of $O(epsilon-3)$.
arXiv Detail & Related papers (2023-04-27T06:27:37Z) - Promises and Pitfalls of the Linearized Laplace in Bayesian Optimization [73.80101701431103]
The linearized-Laplace approximation (LLA) has been shown to be effective and efficient in constructing Bayesian neural networks.
We study the usefulness of the LLA in Bayesian optimization and highlight its strong performance and flexibility.
arXiv Detail & Related papers (2023-04-17T14:23:43Z) - Score-based Diffusion Models in Function Space [137.70916238028306]
Diffusion models have recently emerged as a powerful framework for generative modeling.
This work introduces a mathematically rigorous framework called Denoising Diffusion Operators (DDOs) for training diffusion models in function space.
We show that the corresponding discretized algorithm generates accurate samples at a fixed cost independent of the data resolution.
arXiv Detail & Related papers (2023-02-14T23:50:53Z) - Origins of Low-dimensional Adversarial Perturbations [17.17170592140042]
We study the phenomenon of low-dimensional adversarial perturbations in classification.
The goal is to fool the classifier into flipping its decision on a nonzero fraction of inputs from a designated class.
We compute lowerbounds for the fooling rate of any subspace.
arXiv Detail & Related papers (2022-03-25T17:02:49Z) - On the Benefits of Large Learning Rates for Kernel Methods [110.03020563291788]
We show that a phenomenon can be precisely characterized in the context of kernel methods.
We consider the minimization of a quadratic objective in a separable Hilbert space, and show that with early stopping, the choice of learning rate influences the spectral decomposition of the obtained solution.
arXiv Detail & Related papers (2022-02-28T13:01:04Z) - Robust Estimation for Nonparametric Families via Generative Adversarial
Networks [92.64483100338724]
We provide a framework for designing Generative Adversarial Networks (GANs) to solve high dimensional robust statistics problems.
Our work extend these to robust mean estimation, second moment estimation, and robust linear regression.
In terms of techniques, our proposed GAN losses can be viewed as a smoothed and generalized Kolmogorov-Smirnov distance.
arXiv Detail & Related papers (2022-02-02T20:11:33Z) - On the existence of global minima and convergence analyses for gradient
descent methods in the training of deep neural networks [3.198144010381572]
We study feedforward deep ReLU ANNs with an arbitrarily large number of hidden layers.
We prove convergence of the risk of the GD optimization method with randoms in the training of such ANNs.
We also study solutions of gradient flow differential equations.
arXiv Detail & Related papers (2021-12-17T18:55:40Z) - Universal scaling laws in the gradient descent training of neural
networks [10.508187462682308]
We show that the learning trajectory can be characterized by an explicit bounds at large training times.
Our results are based on spectral analysis of the evolution of a large network trained on the expected loss.
arXiv Detail & Related papers (2021-05-02T16:46:38Z) - Agnostic Proper Learning of Halfspaces under Gaussian Marginals [56.01192577666607]
We study the problem of agnostically learning halfspaces under the Gaussian.
Our main result is the em first proper learning algorithm for this problem.
arXiv Detail & Related papers (2021-02-10T18:40:44Z) - Exponentially Weighted l_2 Regularization Strategy in Constructing
Reinforced Second-order Fuzzy Rule-based Model [72.57056258027336]
In the conventional Takagi-Sugeno-Kang (TSK)-type fuzzy models, constant or linear functions are usually utilized as the consequent parts of the fuzzy rules.
We introduce an exponential weight approach inspired by the weight function theory encountered in harmonic analysis.
arXiv Detail & Related papers (2020-07-02T15:42:15Z) - Learning Likelihoods with Conditional Normalizing Flows [54.60456010771409]
Conditional normalizing flows (CNFs) are efficient in sampling and inference.
We present a study of CNFs where the base density to output space mapping is conditioned on an input x, to model conditional densities p(y|x)
arXiv Detail & Related papers (2019-11-29T19:17:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.