Support vector machines and linear regression coincide with very
high-dimensional features
- URL: http://arxiv.org/abs/2105.14084v1
- Date: Fri, 28 May 2021 20:06:21 GMT
- Title: Support vector machines and linear regression coincide with very
high-dimensional features
- Authors: Navid Ardeshir, Clayton Sanford, Daniel Hsu
- Abstract summary: We show a phenomenon where every training example used to fit an SVM becomes a support vector.
First, we prove a super-linear lower bound on the dimension (in terms of sample size) required for support vector proliferation in independent feature models.
We also identify a sharp phase transition in Gaussian feature models, bound the width of this transition, and give experimental support for its universality.
- Score: 5.878391874426428
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The support vector machine (SVM) and minimum Euclidean norm least squares
regression are two fundamentally different approaches to fitting linear models,
but they have recently been connected in models for very high-dimensional data
through a phenomenon of support vector proliferation, where every training
example used to fit an SVM becomes a support vector. In this paper, we explore
the generality of this phenomenon and make the following contributions. First,
we prove a super-linear lower bound on the dimension (in terms of sample size)
required for support vector proliferation in independent feature models,
matching the upper bounds from previous works. We further identify a sharp
phase transition in Gaussian feature models, bound the width of this
transition, and give experimental support for its universality. Finally, we
hypothesize that this phase transition occurs only in much higher-dimensional
settings in the $\ell_1$ variant of the SVM, and we present a new geometric
characterization of the problem that may elucidate this phenomenon for the
general $\ell_p$ case.
Related papers
- Linear Convergence of Diffusion Models Under the Manifold Hypothesis [5.040884755454258]
We show that the number of steps to converge in Kullback-Leibler(KL) divergence is linear (up to logarithmic terms) in the intrinsic dimension $Leid$.
We also show that this linear dependency is sharp.
arXiv Detail & Related papers (2024-10-11T17:58:30Z) - von Mises Quasi-Processes for Bayesian Circular Regression [57.88921637944379]
We explore a family of expressive and interpretable distributions over circle-valued random functions.
The resulting probability model has connections with continuous spin models in statistical physics.
For posterior inference, we introduce a new Stratonovich-like augmentation that lends itself to fast Markov Chain Monte Carlo sampling.
arXiv Detail & Related papers (2024-06-19T01:57:21Z) - Computational-Statistical Gaps in Gaussian Single-Index Models [77.1473134227844]
Single-Index Models are high-dimensional regression problems with planted structure.
We show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Omega(dkstar/2)$ samples.
arXiv Detail & Related papers (2024-03-08T18:50:19Z) - Theory of mobility edge and non-ergodic extended phase in coupled random
matrices [18.60614534900842]
The mobility edge, as a central concept in disordered models for localization-delocalization transitions, has rarely been discussed in the context of random matrix theory.
We show that their overlapped spectra and un-overlapped spectra exhibit totally different scaling behaviors, which can be used to construct tunable mobility edges.
Our model provides a general framework to realize the mobility edges and non-ergodic phases in a controllable way in RMT.
arXiv Detail & Related papers (2023-11-15T01:43:37Z) - Multifractal dimensions for orthogonal-to-unitary crossover ensemble [1.0793830805346494]
We show that finite-size versions of multifractal dimensions converge to unity logarithmically slowly with increase in the system size $N$.
We apply our results to analyze the multifractal dimensions in a quantum kicked rotor, a Sinai billiard system, and a correlated spin chain model in a random field.
arXiv Detail & Related papers (2023-10-05T13:22:43Z) - Transformers as Support Vector Machines [54.642793677472724]
We establish a formal equivalence between the optimization geometry of self-attention and a hard-margin SVM problem.
We characterize the implicit bias of 1-layer transformers optimized with gradient descent.
We believe these findings inspire the interpretation of transformers as a hierarchy of SVMs that separates and selects optimal tokens.
arXiv Detail & Related papers (2023-08-31T17:57:50Z) - Geometric Neural Diffusion Processes [55.891428654434634]
We extend the framework of diffusion models to incorporate a series of geometric priors in infinite-dimension modelling.
We show that with these conditions, the generative functional model admits the same symmetry.
arXiv Detail & Related papers (2023-07-11T16:51:38Z) - A Differential Geometry Perspective on Orthogonal Recurrent Models [56.09491978954866]
We employ tools and insights from differential geometry to offer a novel perspective on orthogonal RNNs.
We show that orthogonal RNNs may be viewed as optimizing in the space of divergence-free vector fields.
Motivated by this observation, we study a new recurrent model, which spans the entire space of vector fields.
arXiv Detail & Related papers (2021-02-18T19:39:22Z) - Multi-fidelity data fusion for the approximation of scalar functions
with low intrinsic dimensionality through active subspaces [0.0]
We present a multi-fidelity approach involving active subspaces and we test it on two different high-dimensional benchmarks.
In this work we present a multi-fidelity approach involving active subspaces and we test it on two different high-dimensional benchmarks.
arXiv Detail & Related papers (2020-10-16T12:35:49Z) - Asymptotic Errors for Teacher-Student Convex Generalized Linear Models
(or : How to Prove Kabashima's Replica Formula) [23.15629681360836]
We prove an analytical formula for the reconstruction performance of convex generalized linear models.
We show that an analytical continuation may be carried out to extend the result to convex (non-strongly) problems.
We illustrate our claim with numerical examples on mainstream learning methods.
arXiv Detail & Related papers (2020-06-11T16:26:35Z) - A Random Matrix Analysis of Random Fourier Features: Beyond the Gaussian
Kernel, a Precise Phase Transition, and the Corresponding Double Descent [85.77233010209368]
This article characterizes the exacts of random Fourier feature (RFF) regression, in the realistic setting where the number of data samples $n$ is all large and comparable.
This analysis also provides accurate estimates of training and test regression errors for large $n,p,N$.
arXiv Detail & Related papers (2020-06-09T02:05:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.