Learning from Non-Random Data in Hilbert Spaces: An Optimal Recovery
Perspective
- URL: http://arxiv.org/abs/2006.03706v2
- Date: Fri, 11 Sep 2020 20:07:24 GMT
- Title: Learning from Non-Random Data in Hilbert Spaces: An Optimal Recovery
Perspective
- Authors: Simon Foucart, Chunyang Liao, Shahin Shahrampour, Yinsong Wang
- Abstract summary: We consider the regression problem from an Optimal Recovery perspective.
We first develop a semidefinite program for calculating the worst-case error of any recovery map in finite-dimensional Hilbert spaces.
We show that Optimal Recovery provides a formula which is user-friendly from an algorithmic point-of-view.
- Score: 12.674428374982547
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The notion of generalization in classical Statistical Learning is often
attached to the postulate that data points are independent and identically
distributed (IID) random variables. While relevant in many applications, this
postulate may not hold in general, encouraging the development of learning
frameworks that are robust to non-IID data. In this work, we consider the
regression problem from an Optimal Recovery perspective. Relying on a model
assumption comparable to choosing a hypothesis class, a learner aims at
minimizing the worst-case error, without recourse to any probabilistic
assumption on the data. We first develop a semidefinite program for calculating
the worst-case error of any recovery map in finite-dimensional Hilbert spaces.
Then, for any Hilbert space, we show that Optimal Recovery provides a formula
which is user-friendly from an algorithmic point-of-view, as long as the
hypothesis class is linear. Interestingly, this formula coincides with kernel
ridgeless regression in some cases, proving that minimizing the average error
and worst-case error can yield the same solution. We provide numerical
experiments in support of our theoretical findings.
Related papers
- High-dimensional logistic regression with missing data: Imputation, regularization, and universality [7.167672851569787]
We study high-dimensional, ridge-regularized logistic regression.
We provide exact characterizations of both the prediction error and the estimation error.
arXiv Detail & Related papers (2024-10-01T21:41:21Z) - Robust Capped lp-Norm Support Vector Ordinal Regression [85.84718111830752]
Ordinal regression is a specialized supervised problem where the labels show an inherent order.
Support Vector Ordinal Regression, as an outstanding ordinal regression model, is widely used in many ordinal regression tasks.
We introduce a new model, Capped $ell_p$-Norm Support Vector Ordinal Regression(CSVOR), that is robust to outliers.
arXiv Detail & Related papers (2024-04-25T13:56:05Z) - Efficient and Generalizable Certified Unlearning: A Hessian-free Recollection Approach [8.875278412741695]
Machine unlearning strives to uphold the data owners' right to be forgotten by enabling models to selectively forget specific data.
We develop an algorithm that achieves near-instantaneous unlearning as it only requires a vector addition operation.
arXiv Detail & Related papers (2024-04-02T07:54:18Z) - Optimal Multi-Distribution Learning [88.3008613028333]
Multi-distribution learning seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions.
We propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon2.
arXiv Detail & Related papers (2023-12-08T16:06:29Z) - Kernel-based off-policy estimation without overlap: Instance optimality
beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data.
For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z) - Learning linear operators: Infinite-dimensional regression as a well-behaved non-compact inverse problem [4.503368323711748]
We consider the problem of learning a linear operator $theta$ between two Hilbert spaces from empirical observations.
We show that this goal can be reformulated as an inverse problem for $theta$ with the feature that its forward operator is generally non-compact.
We prove that this inverse problem is equivalent to the known compact inverse problem associated with scalar response regression derivation.
arXiv Detail & Related papers (2022-11-16T12:33:01Z) - Generalised Bayesian Inference for Discrete Intractable Likelihood [9.331721990371769]
This paper develops a novel generalised Bayesian inference procedure suitable for discrete intractable likelihood.
Inspired by recent methodological advances for continuous data, the main idea is to update beliefs about model parameters using a discrete Fisher divergence.
The result is a generalised posterior that can be sampled from using standard computational tools, such as Markov Monte Carlo.
arXiv Detail & Related papers (2022-06-16T19:36:17Z) - Experimental Design for Linear Functionals in Reproducing Kernel Hilbert
Spaces [102.08678737900541]
We provide algorithms for constructing bias-aware designs for linear functionals.
We derive non-asymptotic confidence sets for fixed and adaptive designs under sub-Gaussian noise.
arXiv Detail & Related papers (2022-05-26T20:56:25Z) - On the Benefits of Large Learning Rates for Kernel Methods [110.03020563291788]
We show that a phenomenon can be precisely characterized in the context of kernel methods.
We consider the minimization of a quadratic objective in a separable Hilbert space, and show that with early stopping, the choice of learning rate influences the spectral decomposition of the obtained solution.
arXiv Detail & Related papers (2022-02-28T13:01:04Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.