Regularized ERM on random subspaces
- URL: http://arxiv.org/abs/2006.10016v3
- Date: Thu, 25 Feb 2021 14:48:13 GMT
- Title: Regularized ERM on random subspaces
- Authors: Andrea Della Vecchia, Jaouad Mourtada, Ernesto De Vito, Lorenzo
Rosasco
- Abstract summary: We consider possibly data dependent subspaces spanned by a random subset of the data, recovering as a special case Nystr"om approaches for kernel methods.
Considering random subspaces naturally leads to computational savings, but the question is whether the corresponding learning accuracy is degraded.
- Score: 18.541369654442796
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study a natural extension of classical empirical risk minimization, where
the hypothesis space is a random subspace of a given space. In particular, we
consider possibly data dependent subspaces spanned by a random subset of the
data, recovering as a special case Nystr\"om approaches for kernel methods.
Considering random subspaces naturally leads to computational savings, but the
question is whether the corresponding learning accuracy is degraded. These
statistical-computational tradeoffs have been recently explored for the least
squares loss and self-concordant loss functions, such as the logistic loss.
Here, we work to extend these results to convex Lipschitz loss functions, that
might not be smooth, such as the hinge loss used in support vector machines.
This extension requires developing new proofs, that use different technical
tools. Our main results show the existence of different settings, depending on
how hard the learning problem is, for which computational efficiency can be
improved with no loss in performance. Theoretical results are illustrated with
simple numerical experiments.
Related papers
- Refined Risk Bounds for Unbounded Losses via Transductive Priors [58.967816314671296]
We revisit the sequential variants of linear regression with the squared loss, classification problems with hinge loss, and logistic regression.
Our key tools are based on the exponential weights algorithm with carefully chosen transductive priors.
arXiv Detail & Related papers (2024-10-29T00:01:04Z) - On the Performance of Empirical Risk Minimization with Smoothed Data [59.3428024282545]
Empirical Risk Minimization (ERM) is able to achieve sublinear error whenever a class is learnable with iid data.
We show that ERM is able to achieve sublinear error whenever a class is learnable with iid data.
arXiv Detail & Related papers (2024-02-22T21:55:41Z) - On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels [57.46832672991433]
We propose a novel equation discovery method based on Kernel learning and BAyesian Spike-and-Slab priors (KBASS)
We use kernel regression to estimate the target function, which is flexible, expressive, and more robust to data sparsity and noises.
We develop an expectation-propagation expectation-maximization algorithm for efficient posterior inference and function estimation.
arXiv Detail & Related papers (2023-10-09T03:55:09Z) - Random Smoothing Regularization in Kernel Gradient Descent Learning [24.383121157277007]
We present a framework for random smoothing regularization that can adaptively learn a wide range of ground truth functions belonging to the classical Sobolev spaces.
Our estimator can adapt to the structural assumptions of the underlying data and avoid the curse of dimensionality.
arXiv Detail & Related papers (2023-05-05T13:37:34Z) - Regularized ERM on random subspaces [17.927376388967144]
We consider possibly data dependent subspaces spanned by a random subset of the data, recovering as a special case Nystrom approaches for kernel methods.
Considering random subspaces naturally leads to computational savings, but the question is whether the corresponding learning accuracy is degraded.
arXiv Detail & Related papers (2022-12-04T16:12:11Z) - On the Benefits of Large Learning Rates for Kernel Methods [110.03020563291788]
We show that a phenomenon can be precisely characterized in the context of kernel methods.
We consider the minimization of a quadratic objective in a separable Hilbert space, and show that with early stopping, the choice of learning rate influences the spectral decomposition of the obtained solution.
arXiv Detail & Related papers (2022-02-28T13:01:04Z) - More is Less: Inducing Sparsity via Overparameterization [2.885175627590247]
In deep learning it is common to over parameterize neural networks, that is, to use more parameters than training samples.
Quite surprisingly, generalize the neural network via (stochastic) gradient descent leads to that very well.
Our proof relies on analyzing a certain Bregman divergence of the flow.
arXiv Detail & Related papers (2021-12-21T07:55:55Z) - Learning from Non-Random Data in Hilbert Spaces: An Optimal Recovery
Perspective [12.674428374982547]
We consider the regression problem from an Optimal Recovery perspective.
We first develop a semidefinite program for calculating the worst-case error of any recovery map in finite-dimensional Hilbert spaces.
We show that Optimal Recovery provides a formula which is user-friendly from an algorithmic point-of-view.
arXiv Detail & Related papers (2020-06-05T21:49:07Z) - Classification vs regression in overparameterized regimes: Does the loss
function matter? [21.75115239010008]
We show that solutions obtained by least-squares minimum-norm, typically used for regression, are identical to those produced by the hard-margin support vector machine (SVM)
Our results demonstrate the very different roles and properties of loss functions used at the training phase (optimization) and the testing phase (generalization)
arXiv Detail & Related papers (2020-05-16T17:58:25Z) - Supervised Learning: No Loss No Cry [51.07683542418145]
Supervised learning requires the specification of a loss function to minimise.
This paper revisits the sc SLIsotron algorithm of Kakade et al. (2011) through a novel lens.
We show how it provides a principled procedure for learning the loss.
arXiv Detail & Related papers (2020-02-10T05:30:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.