For interpolating kernel machines, minimizing the norm of the ERM
solution minimizes stability
- URL: http://arxiv.org/abs/2006.15522v2
- Date: Sun, 11 Oct 2020 22:58:31 GMT
- Title: For interpolating kernel machines, minimizing the norm of the ERM
solution minimizes stability
- Authors: Akshay Rangamani, Lorenzo Rosasco, Tomaso Poggio
- Abstract summary: We show that the interpolating solution with minimum norm minimizes a bound on $mboxCV_loo$ stability.
Under the assumption of random kernel, the corresponding test error should be expected to follow a double descent curve.
- Score: 20.775719987269003
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the average $\mbox{CV}_{loo}$ stability of kernel ridge-less
regression and derive corresponding risk bounds. We show that the interpolating
solution with minimum norm minimizes a bound on $\mbox{CV}_{loo}$ stability,
which in turn is controlled by the condition number of the empirical kernel
matrix. The latter can be characterized in the asymptotic regime where both the
dimension and cardinality of the data go to infinity. Under the assumption of
random kernel matrices, the corresponding test error should be expected to
follow a double descent curve.
Related papers
- Inertial Quadratic Majorization Minimization with Application to Kernel Regularized Learning [1.0282274843007797]
We introduce the Quadratic Majorization Minimization with Extrapolation (QMME) framework and establish its sequential convergence properties.<n>To demonstrate practical advantages, we apply QMME to large-scale kernel regularized learning problems.
arXiv Detail & Related papers (2025-07-06T05:17:28Z) - Black-Box Uniform Stability for Non-Euclidean Empirical Risk Minimization [15.334392442475115]
We study first-order algorithms that are uniformly stable for empirical risk minimization problems.
We propose a black-box reduction method that turns an optimization algorithm for smooth convex losses into a uniformly stable learning algorithm.
arXiv Detail & Related papers (2024-12-20T14:50:47Z) - Toward Efficient Kernel-Based Solvers for Nonlinear PDEs [19.975293084297014]
This paper introduces a novel kernel learning framework toward efficiently solving nonlinear partial differential equations (PDEs)
In contrast to the state-of-the-art kernel solver that embeds differential operators within kernels, our approach eliminates these operators from the kernel.
We model the solution using a standard kernel form and differentiate the interpolant to compute the derivatives.
arXiv Detail & Related papers (2024-10-15T01:00:43Z) - Trust-Region Sequential Quadratic Programming for Stochastic Optimization with Random Models [57.52124921268249]
We propose a Trust Sequential Quadratic Programming method to find both first and second-order stationary points.
To converge to first-order stationary points, our method computes a gradient step in each iteration defined by minimizing a approximation of the objective subject.
To converge to second-order stationary points, our method additionally computes an eigen step to explore the negative curvature the reduced Hessian matrix.
arXiv Detail & Related papers (2024-09-24T04:39:47Z) - Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - Breaking the Heavy-Tailed Noise Barrier in Stochastic Optimization Problems [56.86067111855056]
We consider clipped optimization problems with heavy-tailed noise with structured density.
We show that it is possible to get faster rates of convergence than $mathcalO(K-(alpha - 1)/alpha)$, when the gradients have finite moments of order.
We prove that the resulting estimates have negligible bias and controllable variance.
arXiv Detail & Related papers (2023-11-07T17:39:17Z) - Fully Stochastic Trust-Region Sequential Quadratic Programming for
Equality-Constrained Optimization Problems [62.83783246648714]
We propose a sequential quadratic programming algorithm (TR-StoSQP) to solve nonlinear optimization problems with objectives and deterministic equality constraints.
The algorithm adaptively selects the trust-region radius and, compared to the existing line-search StoSQP schemes, allows us to utilize indefinite Hessian matrices.
arXiv Detail & Related papers (2022-11-29T05:52:17Z) - Stability and Generalization for Markov Chain Stochastic Gradient
Methods [49.981789906200035]
We provide a comprehensive generalization analysis of MC-SGMs for both minimization and minimax problems.
We establish the optimal excess population risk bounds for both smooth and non-smooth cases.
We develop the first nearly optimal convergence rates for convex-concave problems both in expectation and with high probability.
arXiv Detail & Related papers (2022-09-16T15:42:51Z) - A Local Convergence Theory for the Stochastic Gradient Descent Method in
Non-Convex Optimization With Non-isolated Local Minima [0.0]
Non-isolated minima presents a unique challenge that has remained under-explored.
In this paper, we study the local convergence of the gradient descent method to non-isolated global minima.
arXiv Detail & Related papers (2022-03-21T13:33:37Z) - Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process.
We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator.
We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z) - Gaussian Process-based Min-norm Stabilizing Controller for
Control-Affine Systems with Uncertain Input Effects and Dynamics [90.81186513537777]
We propose a novel compound kernel that captures the control-affine nature of the problem.
We show that this resulting optimization problem is convex, and we call it Gaussian Process-based Control Lyapunov Function Second-Order Cone Program (GP-CLF-SOCP)
arXiv Detail & Related papers (2020-11-14T01:27:32Z) - Towards a Unified Quadrature Framework for Large-Scale Kernel Machines [35.32894170512829]
We develop a quadrature framework for large-scale kernel machines via a numerical integration representation.
We leverage fully symmetric interpolatory rules to efficiently compute quadrature nodes and associated weights for kernel approximation.
arXiv Detail & Related papers (2020-11-03T12:48:07Z) - Learning Stabilizing Controllers for Unstable Linear Quadratic
Regulators from a Single Trajectory [85.29718245299341]
We study linear controllers under quadratic costs model also known as linear quadratic regulators (LQR)
We present two different semi-definite programs (SDP) which results in a controller that stabilizes all systems within an ellipsoid uncertainty set.
We propose an efficient data dependent algorithm -- textsceXploration -- that with high probability quickly identifies a stabilizing controller.
arXiv Detail & Related papers (2020-06-19T08:58:57Z) - Kernel Selection for Modal Linear Regression: Optimal Kernel and IRLS
Algorithm [8.571896191090744]
We show that a Biweight kernel is optimal in the sense of minimizing an mean squared error of a resulting MLR parameter.
Secondly, we provide a kernel class for which algorithm iteratively reweighted least-squares algorithm (IRLS) is guaranteed to converge.
arXiv Detail & Related papers (2020-01-30T03:57:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.