Mean-field Analysis of Generalization Errors
- URL: http://arxiv.org/abs/2306.11623v1
- Date: Tue, 20 Jun 2023 15:49:09 GMT
- Title: Mean-field Analysis of Generalization Errors
- Authors: Gholamali Aminian, Samuel N. Cohen, {\L}ukasz Szpruch
- Abstract summary: We consider the KL-regularized empirical risk minimization problem and establish generic conditions under which the generalization error convergence rate, when training on a sample of size $n$, is $mathcalO (1/n)$.
In the context of supervised learning with a one-hidden layer neural network in the mean-field regime, these conditions are reflected in suitable integrability and regularity assumptions on the loss and activation functions.
- Score: 1.1344265020822928
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: We propose a novel framework for exploring weak and $L_2$ generalization
errors of algorithms through the lens of differential calculus on the space of
probability measures. Specifically, we consider the KL-regularized empirical
risk minimization problem and establish generic conditions under which the
generalization error convergence rate, when training on a sample of size $n$,
is $\mathcal{O}(1/n)$. In the context of supervised learning with a one-hidden
layer neural network in the mean-field regime, these conditions are reflected
in suitable integrability and regularity assumptions on the loss and activation
functions.
Related papers
- Error Bounds of Supervised Classification from Information-Theoretic Perspective [5.281849820329249]
We show that errors are bounded by the complexity, influenced by the smoothness of distribution and the sample size, and constitute an upper on the expected risk.
Our empirical verification confirms a significant positive correlation between the derived theoretical bounds and the practical expected risk.
arXiv Detail & Related papers (2024-06-07T01:07:35Z) - Equivalence of the Empirical Risk Minimization to Regularization on the
Family of f-Divergences [49.853843995972085]
The solution to empirical risk minimization with $f$-divergence regularization (ERM-$f$DR) is presented.
Examples of the solution for particular choices of the function $f$ are presented.
arXiv Detail & Related papers (2024-02-01T11:12:00Z) - A Robustness Analysis of Blind Source Separation [91.3755431537592]
Blind source separation (BSS) aims to recover an unobserved signal from its mixture $X=f(S)$ under the condition that the transformation $f$ is invertible but unknown.
We present a general framework for analysing such violations and quantifying their impact on the blind recovery of $S$ from $X$.
We show that a generic BSS-solution in response to general deviations from its defining structural assumptions can be profitably analysed in the form of explicit continuity guarantees.
arXiv Detail & Related papers (2023-03-17T16:30:51Z) - Optimal variance-reduced stochastic approximation in Banach spaces [114.8734960258221]
We study the problem of estimating the fixed point of a contractive operator defined on a separable Banach space.
We establish non-asymptotic bounds for both the operator defect and the estimation error.
arXiv Detail & Related papers (2022-01-21T02:46:57Z) - The Geometry of Adversarial Training in Binary Classification [1.2891210250935143]
We establish an equivalence between a family of adversarial training problems for non-parametric binary classification and a family of regularized risk minimization problems.
The resulting regularized risk minimization problems admit exact convex relaxations of the type $L1+$ (nonlocal) $operatornameTV$.
arXiv Detail & Related papers (2021-11-26T17:19:50Z) - Decentralized Feature-Distributed Optimization for Generalized Linear
Models [19.800898945436384]
We consider the "all-for-one" decentralized learning problem for generalized linear models.
The features of each sample are partitioned among several collaborating agents in a connected network, but only one agent observes the response variables.
We apply the Chambolle--Pock primal--dual algorithm to an equivalent saddle-point formulation of the problem.
arXiv Detail & Related papers (2021-10-28T16:42:47Z) - Instance-optimality in optimal value estimation: Adaptivity via
variance-reduced Q-learning [99.34907092347733]
We analyze the problem of estimating optimal $Q$-value functions for a discounted Markov decision process with discrete states and actions.
Using a local minimax framework, we show that this functional arises in lower bounds on the accuracy on any estimation procedure.
In the other direction, we establish the sharpness of our lower bounds, up to factors logarithmic in the state and action spaces, by analyzing a variance-reduced version of $Q$-learning.
arXiv Detail & Related papers (2021-06-28T00:38:54Z) - Predicting Unreliable Predictions by Shattering a Neural Network [145.3823991041987]
Piecewise linear neural networks can be split into subfunctions.
Subfunctions have their own activation pattern, domain, and empirical error.
Empirical error for the full network can be written as an expectation over subfunctions.
arXiv Detail & Related papers (2021-06-15T18:34:41Z) - Robust Unsupervised Learning via L-Statistic Minimization [38.49191945141759]
We present a general approach to this problem focusing on unsupervised learning.
The key assumption is that the perturbing distribution is characterized by larger losses relative to a given class of admissible models.
We prove uniform convergence bounds with respect to the proposed criterion for several popular models in unsupervised learning.
arXiv Detail & Related papers (2020-12-14T10:36:06Z) - Amortized Conditional Normalized Maximum Likelihood: Reliable Out of
Distribution Uncertainty Estimation [99.92568326314667]
We propose the amortized conditional normalized maximum likelihood (ACNML) method as a scalable general-purpose approach for uncertainty estimation.
Our algorithm builds on the conditional normalized maximum likelihood (CNML) coding scheme, which has minimax optimal properties according to the minimum description length principle.
We demonstrate that ACNML compares favorably to a number of prior techniques for uncertainty estimation in terms of calibration on out-of-distribution inputs.
arXiv Detail & Related papers (2020-11-05T08:04:34Z) - The Efficacy of $L_1$ Regularization in Two-Layer Neural Networks [36.753907384994704]
A crucial problem in neural networks is to select the most appropriate number of hidden neurons and obtain tight statistical risk bounds.
We show that $L_1$ regularization can control the generalization error and sparsify the input dimension.
An excessively large number of neurons do not necessarily inflate generalization errors under a suitable regularization.
arXiv Detail & Related papers (2020-10-02T15:23:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.