Learning curves for the multi-class teacher-student perceptron
- URL: http://arxiv.org/abs/2203.12094v1
- Date: Tue, 22 Mar 2022 23:16:36 GMT
- Title: Learning curves for the multi-class teacher-student perceptron
- Authors: Elisabetta Cornacchia, Francesca Mignacco, Rodrigo Veiga, C\'edric
Gerbelot, Bruno Loureiro, Lenka Zdeborov\'a
- Abstract summary: One of the most classical results in high-dimensional learning theory provides a closed-form expression for the generalisation error of binary classification.
Both Bayes-optimal estimation and empirical risk minimisation (ERM) were extensively analysed for this setting.
Yet, an analogous analysis for the corresponding multi-student perceptron was missing.
- Score: 5.480546613836199
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One of the most classical results in high-dimensional learning theory
provides a closed-form expression for the generalisation error of binary
classification with the single-layer teacher-student perceptron on i.i.d.
Gaussian inputs. Both Bayes-optimal estimation and empirical risk minimisation
(ERM) were extensively analysed for this setting. At the same time, a
considerable part of modern machine learning practice concerns multi-class
classification. Yet, an analogous analysis for the corresponding multi-class
teacher-student perceptron was missing. In this manuscript we fill this gap by
deriving and evaluating asymptotic expressions for both the Bayes-optimal and
ERM generalisation errors in the high-dimensional regime. For Gaussian teacher
weights, we investigate the performance of ERM with both cross-entropy and
square losses, and explore the role of ridge regularisation in approaching
Bayes-optimality. In particular, we observe that regularised cross-entropy
minimisation yields close-to-optimal accuracy. Instead, for a binary teacher we
show that a first-order phase transition arises in the Bayes-optimal
performance.
Related papers
- BAPE: Learning an Explicit Bayes Classifier for Long-tailed Visual Recognition [78.70453964041718]
Current deep learning algorithms usually solve for the optimal classifier by emphimplicitly estimating the posterior probabilities.<n>This simple methodology has been proven effective for meticulously balanced academic benchmark datasets.<n>However, it is not applicable to the long-tailed data distributions in the real world.<n>This paper presents a novel approach (BAPE) that provides a more precise theoretical estimation of the data distributions.
arXiv Detail & Related papers (2025-06-29T15:12:50Z) - Self-Boost via Optimal Retraining: An Analysis via Approximate Message Passing [58.52119063742121]
Retraining a model using its own predictions together with the original, potentially noisy labels is a well-known strategy for improving the model performance.<n>This paper addresses the question of how to optimally combine the model's predictions and the provided labels.<n>Our main contribution is the derivation of the Bayes optimal aggregator function to combine the current model's predictions and the given labels.
arXiv Detail & Related papers (2025-05-21T07:16:44Z) - Bayesian Cross-Modal Alignment Learning for Few-Shot Out-of-Distribution Generalization [47.64583975469164]
We introduce a novel cross-modal image-text alignment learning method (Bayes-CAL) to address this issue.
Bayes-CAL achieves state-of-the-art OoD generalization performances on two-dimensional distribution shifts.
Compared with CLIP-like models, Bayes-CAL yields more stable generalization performances on unseen classes.
arXiv Detail & Related papers (2025-04-13T06:13:37Z) - Gradient Extrapolation for Debiased Representation Learning [7.183424522250937]
Gradient Extrapolation for Debiased Representation Learning (GERNE) is designed to learn debiased representations in both known and unknown attribute training cases.
GERNE can serve as a general framework for debiasing with methods, such as ERM, reweighting, and resampling, being shown as special cases.
The proposed approach is validated on five vision and one NLP benchmarks, demonstrating competitive and often superior performance compared to state-of-the-art baseline methods.
arXiv Detail & Related papers (2025-03-17T14:48:57Z) - A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning.
These problems are often formalized as Bi-Level optimizations (BLO)
We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z) - Learning a Gaussian Mixture for Sparsity Regularization in Inverse
Problems [2.375943263571389]
In inverse problems, the incorporation of a sparsity prior yields a regularization effect on the solution.
We propose a probabilistic sparsity prior formulated as a mixture of Gaussians, capable of modeling sparsity with respect to a generic basis.
We put forth both a supervised and an unsupervised training strategy to estimate the parameters of this network.
arXiv Detail & Related papers (2024-01-29T22:52:57Z) - Compound Batch Normalization for Long-tailed Image Classification [77.42829178064807]
We propose a compound batch normalization method based on a Gaussian mixture.
It can model the feature space more comprehensively and reduce the dominance of head classes.
The proposed method outperforms existing methods on long-tailed image classification.
arXiv Detail & Related papers (2022-12-02T07:31:39Z) - On the Benefits of Large Learning Rates for Kernel Methods [110.03020563291788]
We show that a phenomenon can be precisely characterized in the context of kernel methods.
We consider the minimization of a quadratic objective in a separable Hilbert space, and show that with early stopping, the choice of learning rate influences the spectral decomposition of the obtained solution.
arXiv Detail & Related papers (2022-02-28T13:01:04Z) - Relieving Long-tailed Instance Segmentation via Pairwise Class Balance [85.53585498649252]
Long-tailed instance segmentation is a challenging task due to the extreme imbalance of training samples among classes.
It causes severe biases of the head classes (with majority samples) against the tailed ones.
We propose a novel Pairwise Class Balance (PCB) method, built upon a confusion matrix which is updated during training to accumulate the ongoing prediction preferences.
arXiv Detail & Related papers (2022-01-08T07:48:36Z) - Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models.
In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints.
A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z) - Nonasymptotic theory for two-layer neural networks: Beyond the
bias-variance trade-off [10.182922771556742]
We present a nonasymptotic generalization theory for two-layer neural networks with ReLU activation function.
We show that overparametrized random feature models suffer from the curse of dimensionality and thus are suboptimal.
arXiv Detail & Related papers (2021-06-09T03:52:18Z) - Learning Gaussian Mixtures with Generalised Linear Models: Precise
Asymptotics in High-dimensions [79.35722941720734]
Generalised linear models for multi-class classification problems are one of the fundamental building blocks of modern machine learning tasks.
We prove exacts characterising the estimator in high-dimensions via empirical risk minimisation.
We discuss how our theory can be applied beyond the scope of synthetic data.
arXiv Detail & Related papers (2021-06-07T16:53:56Z) - Binary Classification of Gaussian Mixtures: Abundance of Support
Vectors, Benign Overfitting and Regularization [39.35822033674126]
We study binary linear classification under a generative Gaussian mixture model.
We derive novel non-asymptotic bounds on the classification error of the latter.
Our results extend to a noisy model with constant probability noise flips.
arXiv Detail & Related papers (2020-11-18T07:59:55Z) - Deep Speaker Vector Normalization with Maximum Gaussianality Training [13.310988353839237]
A key problem with deep speaker embedding is that the resulting deep speaker vectors tend to be irregularly distributed.
In previous research, we proposed a deep normalization approach based on a new discriminative normalization flow (DNF) model.
Despite this remarkable success, we empirically found that the latent codes produced by the DNF model are generally neither homogeneous nor Gaussian.
We propose a new Maximum Gaussianality (MG) training approach that directly maximizes the Gaussianality of the latent codes.
arXiv Detail & Related papers (2020-10-30T09:42:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.