Minimax Supervised Clustering in the Anisotropic Gaussian Mixture Model:
A new take on Robust Interpolation
- URL: http://arxiv.org/abs/2111.07041v1
- Date: Sat, 13 Nov 2021 05:19:37 GMT
- Title: Minimax Supervised Clustering in the Anisotropic Gaussian Mixture Model:
A new take on Robust Interpolation
- Authors: Stanislav Minsker, Mohamed Ndaoud and Yiqiu Shen
- Abstract summary: We study the supervised clustering problem under the two-component anisotropic Gaussian mixture model.
We show that in the high-dimensional regime, the linear discriminant analysis (LDA) classifier turns out to be sub-optimal in the minimax sense.
- Score: 5.98367009147573
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the supervised clustering problem under the two-component
anisotropic Gaussian mixture model in high dimensions and in the non-asymptotic
setting. We first derive a lower and a matching upper bound for the minimax
risk of clustering in this framework. We also show that in the high-dimensional
regime, the linear discriminant analysis (LDA) classifier turns out to be
sub-optimal in the minimax sense. Next, we characterize precisely the risk of
$\ell_2$-regularized supervised least squares classifiers. We deduce the fact
that the interpolating solution may outperform the regularized classifier,
under mild assumptions on the covariance structure of the noise. Our analysis
also shows that interpolation can be robust to corruption in the covariance of
the noise when the signal is aligned with the "clean" part of the covariance,
for the properly defined notion of alignment. To the best of our knowledge,
this peculiar phenomenon has not yet been investigated in the rapidly growing
literature related to interpolation. We conclude that interpolation is not only
benign but can also be optimal, and in some cases robust.
Related papers
- Generalization error of min-norm interpolators in transfer learning [2.7309692684728617]
Min-norm interpolators emerge naturally as implicit regularized limits of modern machine learning algorithms.
In many applications, a limited amount of test data may be available during training, yet properties of min-norm in this setting are not well-understood.
We establish a novel anisotropic local law to achieve these characterizations.
arXiv Detail & Related papers (2024-06-20T02:23:28Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Precise Asymptotic Generalization for Multiclass Classification with
Overparameterized Linear Models [4.093769373833101]
We resolve the conjecture posed in Subramanian et al.'22, where the number of data points, features, and classes all grow together.
Our new lower bounds are akin to an information-theoretic strong converse: they establish that the misclassification rate goes to 0 or 1ally.
The key to our tight analysis is a new variant of the Hanson-Wright inequality which is broadly useful for multiclass problems with sparse labels.
arXiv Detail & Related papers (2023-06-23T00:59:15Z) - Semi-Supervised Subspace Clustering via Tensor Low-Rank Representation [64.49871502193477]
We propose a novel semi-supervised subspace clustering method, which is able to simultaneously augment the initial supervisory information and construct a discriminative affinity matrix.
Comprehensive experimental results on six commonly-used benchmark datasets demonstrate the superiority of our method over state-of-the-art methods.
arXiv Detail & Related papers (2022-05-21T01:47:17Z) - Hyperspectral Image Denoising Using Non-convex Local Low-rank and Sparse
Separation with Spatial-Spectral Total Variation Regularization [49.55649406434796]
We propose a novel non particular approach to robust principal component analysis for HSI denoising.
We develop accurate approximations to both rank and sparse components.
Experiments on both simulated and real HSIs demonstrate the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-01-08T11:48:46Z) - Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and
Beyond [63.59034509960994]
We study shuffling-based variants: minibatch and local Random Reshuffling, which draw gradients without replacement.
For smooth functions satisfying the Polyak-Lojasiewicz condition, we obtain convergence bounds which show that these shuffling-based variants converge faster than their with-replacement counterparts.
We propose an algorithmic modification called synchronized shuffling that leads to convergence rates faster than our lower bounds in near-homogeneous settings.
arXiv Detail & Related papers (2021-10-20T02:25:25Z) - Minimum $\ell_{1}$-norm interpolators: Precise asymptotics and multiple
descent [19.781475554462553]
This paper pursues theoretical understanding for an important type of interpolators: the minimum $ell_1$-norm interpolator.
We observe, and provide rigorous theoretical justification for, a curious multi-descent phenomenon.
Our finding is built upon an exact characterization of the risk behavior, which is governed by a system of two non-linear equations with two unknowns.
arXiv Detail & Related papers (2021-10-18T17:51:14Z) - Interpolation can hurt robust generalization even when there is no noise [76.3492338989419]
We show that avoiding generalization through ridge regularization can significantly improve generalization even in the absence of noise.
We prove this phenomenon for the robust risk of both linear regression and classification and hence provide the first theoretical result on robust overfitting.
arXiv Detail & Related papers (2021-08-05T23:04:15Z) - Learning Gaussian Mixtures with Generalised Linear Models: Precise
Asymptotics in High-dimensions [79.35722941720734]
Generalised linear models for multi-class classification problems are one of the fundamental building blocks of modern machine learning tasks.
We prove exacts characterising the estimator in high-dimensions via empirical risk minimisation.
We discuss how our theory can be applied beyond the scope of synthetic data.
arXiv Detail & Related papers (2021-06-07T16:53:56Z) - Wide flat minima and optimal generalization in classifying
high-dimensional Gaussian mixtures [8.556763944288116]
We show that there exist configurations that achieve the Bayes-optimal generalization error, even in the case of unbalanced clusters.
We also consider the algorithmically relevant case of targeting wide flat minima of the mean squared error loss.
arXiv Detail & Related papers (2020-10-27T01:32:03Z) - The role of regularization in classification of high-dimensional noisy
Gaussian mixture [36.279288150100875]
We consider a high-dimensional mixture of two Gaussians in the noisy regime.
We provide a rigorous analysis of the generalization error of regularized convex classifiers.
We discuss surprising effects of the regularization that in some cases allows to reach the Bayes-optimal performances.
arXiv Detail & Related papers (2020-02-26T14:54:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.