Bayes beats Cross Validation: Efficient and Accurate Ridge Regression
via Expectation Maximization
- URL: http://arxiv.org/abs/2310.18860v2
- Date: Fri, 3 Nov 2023 02:00:03 GMT
- Title: Bayes beats Cross Validation: Efficient and Accurate Ridge Regression
via Expectation Maximization
- Authors: Shu Yu Tew, Mario Boley, Daniel F. Schmidt
- Abstract summary: We present a method for the regularization hyper- parameter, $lambda$, that is faster to compute than leave-one-out cross-validation (LOOCV)
We show that the proposed method is guaranteed to find a unique optimal solution for large enough $n$, under relatively mild conditions.
- Score: 3.061662434597098
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a novel method for tuning the regularization hyper-parameter,
$\lambda$, of a ridge regression that is faster to compute than leave-one-out
cross-validation (LOOCV) while yielding estimates of the regression parameters
of equal, or particularly in the setting of sparse covariates, superior quality
to those obtained by minimising the LOOCV risk. The LOOCV risk can suffer from
multiple and bad local minima for finite $n$ and thus requires the
specification of a set of candidate $\lambda$, which can fail to provide good
solutions. In contrast, we show that the proposed method is guaranteed to find
a unique optimal solution for large enough $n$, under relatively mild
conditions, without requiring the specification of any difficult to determine
hyper-parameters. This is based on a Bayesian formulation of ridge regression
that we prove to have a unimodal posterior for large enough $n$, allowing for
both the optimal $\lambda$ and the regression coefficients to be jointly
learned within an iterative expectation maximization (EM) procedure.
Importantly, we show that by utilizing an appropriate preprocessing step, a
single iteration of the main EM loop can be implemented in $O(\min(n, p))$
operations, for input data with $n$ rows and $p$ columns. In contrast,
evaluating a single value of $\lambda$ using fast LOOCV costs $O(n \min(n, p))$
operations when using the same preprocessing. This advantage amounts to an
asymptotic improvement of a factor of $l$ for $l$ candidate values for
$\lambda$ (in the regime $q, p \in O(\sqrt{n})$ where $q$ is the number of
regression targets).
Related papers
- Transfer Q Star: Principled Decoding for LLM Alignment [105.89114186982972]
Transfer $Q*$ estimates the optimal value function for a target reward $r$ through a baseline model.
Our approach significantly reduces the sub-optimality gap observed in prior SoTA methods.
arXiv Detail & Related papers (2024-05-30T21:36:12Z) - Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic
Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel.
An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost.
Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z) - Robust Nonparametric Regression under Poisoning Attack [13.470899588917716]
An adversarial attacker can modify the values of up to $q$ samples from a training dataset of size $N$.
Our initial solution is an M-estimator based on Huber loss minimization.
The final estimate is nearly minimax optimal for arbitrary $q$, up to a $ln N$ factor.
arXiv Detail & Related papers (2023-05-26T09:33:17Z) - Estimating the minimizer and the minimum value of a regression function
under passive design [72.85024381807466]
We propose a new method for estimating the minimizer $boldsymbolx*$ and the minimum value $f*$ of a smooth and strongly convex regression function $f$.
We derive non-asymptotic upper bounds for the quadratic risk and optimization error of $boldsymbolz_n$, and for the risk of estimating $f*$.
arXiv Detail & Related papers (2022-11-29T18:38:40Z) - Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model.
The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z) - Outlier-robust sparse/low-rank least-squares regression and robust
matrix completion [1.0878040851637998]
We study high-dimensional least-squares regression within a subgaussian statistical learning framework with heterogeneous noise.
We also present a novel theory of trace-regression with matrix decomposition based on a new application of the product process.
arXiv Detail & Related papers (2020-12-12T07:42:47Z) - Conditional Uncorrelation and Efficient Non-approximate Subset Selection
in Sparse Regression [72.84177488527398]
We consider sparse regression from the view of correlation, and propose the formula of conditional uncorrelation.
By the proposed method, the computational complexity is reduced from $O(frac16k3+mk2+mkd)$ to $O(frac16k3+frac12mk2)$ for each candidate subset in sparse regression.
arXiv Detail & Related papers (2020-09-08T20:32:26Z) - Minimum discrepancy principle strategy for choosing $k$ in $k$-NN regression [2.0411082897313984]
We present a novel data-driven strategy to choose the hyper parameter $k$ in the $k$-NN regression estimator without using any hold-out data.
We propose using an easily implemented in practice strategy based on the idea of early stopping and the minimum discrepancy principle.
arXiv Detail & Related papers (2020-08-20T00:13:19Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.