Sharp concentration of uniform generalization errors in binary linear classification
- URL: http://arxiv.org/abs/2505.16713v2
- Date: Thu, 26 Jun 2025 06:57:11 GMT
- Title: Sharp concentration of uniform generalization errors in binary linear classification
- Authors: Shogo Nakakita,
- Abstract summary: We show that almost sure convergence of uniform generalization errors to their expectation occurs in very broad settings.<n>Using this convergence, we establish uniform laws of large numbers under dimension-free conditions.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We examine the concentration of uniform generalization errors around their expectation in binary linear classification problems via an isoperimetric argument. In particular, we establish Poincar\'{e} and log-Sobolev inequalities for the joint distribution of the output labels and the label-weighted input vectors, which we apply to derive concentration bounds. The derived concentration bounds are sharp up to moderate multiplicative constants by those under well-balanced labels. In asymptotic analysis, we also show that almost sure convergence of uniform generalization errors to their expectation occurs in very broad settings, such as proportionally high-dimensional regimes. Using this convergence, we establish uniform laws of large numbers under dimension-free conditions.
Related papers
- High-Dimensional Kernel Methods under Covariate Shift: Data-Dependent Implicit Regularization [83.06112052443233]
This paper studies kernel ridge regression in high dimensions under covariate shifts.
By a bias-variance decomposition, we theoretically demonstrate that the re-weighting strategy allows for decreasing the variance.
For bias, we analyze the regularization of the arbitrary or well-chosen scale, showing that the bias can behave very differently under different regularization scales.
arXiv Detail & Related papers (2024-06-05T12:03:27Z) - Reflection coupling for unadjusted generalized Hamiltonian Monte Carlo in the nonconvex stochastic gradient case [0.0]
Contraction in Wasserstein 1-distance with explicit rates is established for generalized Hamiltonian Monte Carlo with gradients under possibly non diffusion conditions.
The algorithms considered include kinetic splitting schemes of Langevin, commonly used in molecular dynamics simulations.
arXiv Detail & Related papers (2023-10-28T18:25:59Z) - Concentration inequalities for high-dimensional linear processes with dependent innovations [0.0]
We develop concentration inequalities for the $l_infty$ norm of vector linear processes with sub-Weibull, mixingale innovations.
We apply these inequalities to sparse estimation of large-dimensional VAR(p) systems and heterocedasticity and autocorrelation consistent (HAC) high-dimensional covariance estimation.
arXiv Detail & Related papers (2023-07-23T18:05:53Z) - The Implicit Bias of Batch Normalization in Linear Models and Two-layer
Linear Convolutional Neural Networks [117.93273337740442]
We show that gradient descent converges to a uniform margin classifier on the training data with an $exp(-Omega(log2 t))$ convergence rate.
We also show that batch normalization has an implicit bias towards a patch-wise uniform margin.
arXiv Detail & Related papers (2023-06-20T16:58:00Z) - Learning Linear Causal Representations from Interventions under General
Nonlinear Mixing [52.66151568785088]
We prove strong identifiability results given unknown single-node interventions without access to the intervention targets.
This is the first instance of causal identifiability from non-paired interventions for deep neural network embeddings.
arXiv Detail & Related papers (2023-06-04T02:32:12Z) - Concentration inequalities for leave-one-out cross validation [1.90365714903665]
We show that estimator stability is enough to show that leave-one-out cross validation is a sound procedure.
We obtain our results by relying on random variables with distribution satisfying the logarithmic Sobolev inequality.
arXiv Detail & Related papers (2022-11-04T14:08:53Z) - Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks.
We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space.
We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z) - Spectral clustering under degree heterogeneity: a case for the random
walk Laplacian [83.79286663107845]
This paper shows that graph spectral embedding using the random walk Laplacian produces vector representations which are completely corrected for node degree.
In the special case of a degree-corrected block model, the embedding concentrates about K distinct points, representing communities.
arXiv Detail & Related papers (2021-05-03T16:36:27Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - Concentration Inequalities for Statistical Inference [3.10770247120758]
This paper gives a review of concentration inequalities which are widely employed in non-asymptotical analyses of mathematical statistics.<n>We aim to illustrate the concentration inequalities with known constants and to improve existing bounds with sharper constants.
arXiv Detail & Related papers (2020-11-04T12:54:06Z) - The role of regularization in classification of high-dimensional noisy
Gaussian mixture [36.279288150100875]
We consider a high-dimensional mixture of two Gaussians in the noisy regime.
We provide a rigorous analysis of the generalization error of regularized convex classifiers.
We discuss surprising effects of the regularization that in some cases allows to reach the Bayes-optimal performances.
arXiv Detail & Related papers (2020-02-26T14:54:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.