The Statistical Cost of Robust Kernel Hyperparameter Tuning
- URL: http://arxiv.org/abs/2006.08035v1
- Date: Sun, 14 Jun 2020 21:56:33 GMT
- Title: The Statistical Cost of Robust Kernel Hyperparameter Tuning
- Authors: Raphael A. Meyer, Christopher Musco
- Abstract summary: We study the statistical complexity of kernel hyperparameter tuning in the setting of active regression under adversarial noise.
We provide finite-sample guarantees for the problem, characterizing how increasing the complexity of the kernel class increases the complexity of learning kernel hyper parameters.
- Score: 20.42751031392928
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper studies the statistical complexity of kernel hyperparameter tuning
in the setting of active regression under adversarial noise. We consider the
problem of finding the best interpolant from a class of kernels with unknown
hyperparameters, assuming only that the noise is square-integrable. We provide
finite-sample guarantees for the problem, characterizing how increasing the
complexity of the kernel class increases the complexity of learning kernel
hyperparameters. For common kernel classes (e.g. squared-exponential kernels
with unknown lengthscale), our results show that hyperparameter optimization
increases sample complexity by just a logarithmic factor, in comparison to the
setting where optimal parameters are known in advance. Our result is based on a
subsampling guarantee for linear regression under multiple design matrices,
combined with an {\epsilon}-net argument for discretizing kernel
parameterizations.
Related papers
- Optimal Kernel Choice for Score Function-based Causal Discovery [92.65034439889872]
We propose a kernel selection method within the generalized score function that automatically selects the optimal kernel that best fits the data.
We conduct experiments on both synthetic data and real-world benchmarks, and the results demonstrate that our proposed method outperforms kernel selection methods.
arXiv Detail & Related papers (2024-07-14T09:32:20Z) - Variable Hyperparameterized Gaussian Kernel using Displaced Squeezed Vacuum State [2.1408617023874443]
A multimode coherent state can generate the Gaussian kernel with a constant value of hyper parameter.
This constant hyper parameter has limited the application of the Gaussian kernel when it is applied to complex learning problems.
We realize the variable hyper parameterized kernel with a multimode-displaced squeezed vacuum state.
arXiv Detail & Related papers (2024-03-18T08:25:56Z) - A Unified Gaussian Process for Branching and Nested Hyperparameter
Optimization [19.351804144005744]
In deep learning, tuning parameters with conditional dependence are common in practice.
New GP model accounts for the dependent structure among input variables through a new kernel function.
High prediction accuracy and better optimization efficiency are observed in a series of synthetic simulations and real data applications of neural networks.
arXiv Detail & Related papers (2024-01-19T21:11:32Z) - Gaussian Process Regression under Computational and Epistemic Misspecification [4.5656369638728656]
In large data applications, computational costs can be reduced using low-rank or sparse approximations of the kernel.
This paper investigates the effect of such kernel approximations on the element error.
arXiv Detail & Related papers (2023-12-14T18:53:32Z) - Gaussian Process Uniform Error Bounds with Unknown Hyperparameters for
Safety-Critical Applications [71.23286211775084]
We introduce robust Gaussian process uniform error bounds in settings with unknown hyper parameters.
Our approach computes a confidence region in the space of hyper parameters, which enables us to obtain a probabilistic upper bound for the model error.
Experiments show that the bound performs significantly better than vanilla and fully Bayesian processes.
arXiv Detail & Related papers (2021-09-06T17:10:01Z) - Reducing the Variance of Gaussian Process Hyperparameter Optimization
with Preconditioning [54.01682318834995]
Preconditioning is a highly effective step for any iterative method involving matrix-vector multiplication.
We prove that preconditioning has an additional benefit that has been previously unexplored.
It simultaneously can reduce variance at essentially negligible cost.
arXiv Detail & Related papers (2021-07-01T06:43:11Z) - Implicit differentiation for fast hyperparameter selection in non-smooth
convex learning [87.60600646105696]
We study first-order methods when the inner optimization problem is convex but non-smooth.
We show that the forward-mode differentiation of proximal gradient descent and proximal coordinate descent yield sequences of Jacobians converging toward the exact Jacobian.
arXiv Detail & Related papers (2021-05-04T17:31:28Z) - Flow-based Kernel Prior with Application to Blind Super-Resolution [143.21527713002354]
Kernel estimation is generally one of the key problems for blind image super-resolution (SR)
This paper proposes a normalizing flow-based kernel prior (FKP) for kernel modeling.
Experiments on synthetic and real-world images demonstrate that the proposed FKP can significantly improve the kernel estimation accuracy.
arXiv Detail & Related papers (2021-03-29T22:37:06Z) - Off-the-grid: Fast and Effective Hyperparameter Search for Kernel
Clustering [2.304694255100371]
We study the impact of kernel parameters on kernel $k$-means.
In particular, we derive a lower bound, tight up to constant factors, below which the parameter of the RBF kernel will render kernel $k$-means meaningless.
arXiv Detail & Related papers (2020-06-24T08:58:58Z) - Sparse Gaussian Processes via Parametric Families of Compactly-supported
Kernels [0.6091702876917279]
We propose a method for deriving parametric families of kernel functions with compact support.
The parameters of this family of kernels can be learned from data using maximum likelihood estimation.
We show that these approximations incur minimal error over the exact models when modeling data drawn directly from a target GP.
arXiv Detail & Related papers (2020-06-05T20:44:09Z) - On the infinite width limit of neural networks with a standard
parameterization [52.07828272324366]
We propose an improved extrapolation of the standard parameterization that preserves all of these properties as width is taken to infinity.
We show experimentally that the resulting kernels typically achieve similar accuracy to those resulting from an NTK parameterization.
arXiv Detail & Related papers (2020-01-21T01:02:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.