On the kernel learning problem
- URL: http://arxiv.org/abs/2502.11665v1
- Date: Mon, 17 Feb 2025 10:54:01 GMT
- Title: On the kernel learning problem
- Authors: Yang Li, Feng Ruan,
- Abstract summary: kernel ridge regression problem aims to find the best fit for the output $Y$ as a function of the input data $Xin mathbbRd$.<n>We consider a generalization of the kernel ridge regression problem, by introducing an extra matrix parameter $U$.<n>This naturally leads to a nonlinear variational problem to optimize the choice of $U$.
- Score: 4.917649865600782
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The classical kernel ridge regression problem aims to find the best fit for the output $Y$ as a function of the input data $X\in \mathbb{R}^d$, with a fixed choice of regularization term imposed by a given choice of a reproducing kernel Hilbert space, such as a Sobolev space. Here we consider a generalization of the kernel ridge regression problem, by introducing an extra matrix parameter $U$, which aims to detect the scale parameters and the feature variables in the data, and thereby improve the efficiency of kernel ridge regression. This naturally leads to a nonlinear variational problem to optimize the choice of $U$. We study various foundational mathematical aspects of this variational problem, and in particular how this behaves in the presence of multiscale structures in the data.
Related papers
- Highly Adaptive Ridge [84.38107748875144]
We propose a regression method that achieves a $n-2/3$ dimension-free L2 convergence rate in the class of right-continuous functions with square-integrable sectional derivatives.
Har is exactly kernel ridge regression with a specific data-adaptive kernel based on a saturated zero-order tensor-product spline basis expansion.
We demonstrate empirical performance better than state-of-the-art algorithms for small datasets in particular.
arXiv Detail & Related papers (2024-10-03T17:06:06Z) - A Structure-Preserving Kernel Method for Learning Hamiltonian Systems [3.594638299627404]
A structure-preserving kernel ridge regression method is presented that allows the recovery of potentially high-dimensional and nonlinear Hamiltonian functions.
The paper extends kernel regression methods to problems in which loss functions involving linear functions of gradients are required.
A full error analysis is conducted that provides convergence rates using fixed and adaptive regularization parameters.
arXiv Detail & Related papers (2024-03-15T07:20:21Z) - Accurate and Scalable Stochastic Gaussian Process Regression via Learnable Coreset-based Variational Inference [8.077736581030264]
We introduce a novel inductive variational inference method for Gaussian process ($mathcalGP$) regression, by deriving a posterior over a learnable set of coresets.
Unlike former free-form variational families for inference, our coreset-based variational $mathcalGP$ (CVGP) is defined in terms of the $mathcalGP$ prior and the (weighted) data likelihood.
arXiv Detail & Related papers (2023-11-02T17:22:22Z) - MOCK: an Algorithm for Learning Nonparametric Differential Equations via Multivariate Occupation Kernel Functions [0.6030884970981525]
A nonparametric system of ordinary differential equations from trajectories in a $d$-dimensional state space requires learning $d$ functions of $d$ variables.<n>Explicit formulations often scale quadratically in $d$ unless additional knowledge about system properties, such as sparsity and symmetries, is available.<n>We propose a linear approach, the multivariate occupation kernel method (MOCK), using the implicit formulation provided by vector-valued kernel Hilbert spaces.
arXiv Detail & Related papers (2023-06-16T21:49:36Z) - Experimental Design for Linear Functionals in Reproducing Kernel Hilbert
Spaces [102.08678737900541]
We provide algorithms for constructing bias-aware designs for linear functionals.
We derive non-asymptotic confidence sets for fixed and adaptive designs under sub-Gaussian noise.
arXiv Detail & Related papers (2022-05-26T20:56:25Z) - Sobolev Acceleration and Statistical Optimality for Learning Elliptic
Equations via Gradient Descent [11.483919798541393]
We study the statistical limits in terms of Sobolev norms of gradient descent for solving inverse problem from randomly sampled noisy observations.
Our class of objective functions includes Sobolev training for kernel regression, Deep Ritz Methods (DRM), and Physics Informed Neural Networks (PINN)
arXiv Detail & Related papers (2022-05-15T17:01:53Z) - Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with
Variance Reduction and its Application to Optimization [50.83356836818667]
gradient Langevin Dynamics is one of the most fundamental algorithms to solve non-eps optimization problems.
In this paper, we show two variants of this kind, namely the Variance Reduced Langevin Dynamics and the Recursive Gradient Langevin Dynamics.
arXiv Detail & Related papers (2022-03-30T11:39:00Z) - Scalable Variational Gaussian Processes via Harmonic Kernel
Decomposition [54.07797071198249]
We introduce a new scalable variational Gaussian process approximation which provides a high fidelity approximation while retaining general applicability.
We demonstrate that, on a range of regression and classification problems, our approach can exploit input space symmetries such as translations and reflections.
Notably, our approach achieves state-of-the-art results on CIFAR-10 among pure GP models.
arXiv Detail & Related papers (2021-06-10T18:17:57Z) - Adiabatic Quantum Feature Selection for Sparse Linear Regression [0.17499351967216337]
We formulate and compare the quality of QUBO solution on synthetic and real world datasets.
The results demonstrate the effectiveness of the proposed adiabatic quantum computing approach in finding the optimal solution.
arXiv Detail & Related papers (2021-06-04T09:14:01Z) - Implicit differentiation for fast hyperparameter selection in non-smooth
convex learning [87.60600646105696]
We study first-order methods when the inner optimization problem is convex but non-smooth.
We show that the forward-mode differentiation of proximal gradient descent and proximal coordinate descent yield sequences of Jacobians converging toward the exact Jacobian.
arXiv Detail & Related papers (2021-05-04T17:31:28Z) - Tight Nonparametric Convergence Rates for Stochastic Gradient Descent
under the Noiseless Linear Model [0.0]
We analyze the convergence of single-pass, fixed step-size gradient descent on the least-square risk under this model.
As a special case, we analyze an online algorithm for estimating a real function on the unit interval from the noiseless observation of its value at randomly sampled points.
arXiv Detail & Related papers (2020-06-15T08:25:50Z) - Implicit differentiation of Lasso-type models for hyperparameter
optimization [82.73138686390514]
We introduce an efficient implicit differentiation algorithm, without matrix inversion, tailored for Lasso-type problems.
Our approach scales to high-dimensional data by leveraging the sparsity of the solutions.
arXiv Detail & Related papers (2020-02-20T18:43:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.