Optimal Cross-Validation for Sparse Linear Regression
- URL: http://arxiv.org/abs/2306.14851v3
- Date: Sun, 11 May 2025 10:11:36 GMT
- Title: Optimal Cross-Validation for Sparse Linear Regression
- Authors: Ryan Cory-Wright, Andrés Gómez,
- Abstract summary: We use k-fold cross-validation to select sparsity and robustness of linear regressors.<n>Cross-validation substantially increases the computational cost of sparse regression.<n>We improve upon this state of affairs by solving 50-80% fewer mixed-integer optimization problems.
- Score: 5.156484100374059
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Given a high-dimensional covariate matrix and a response vector, ridge-regularized sparse linear regression selects a subset of features that explains the relationship between covariates and the response in an interpretable manner. To select the sparsity and robustness of linear regressors, techniques like k-fold cross-validation are commonly used for hyperparameter tuning. However, cross-validation substantially increases the computational cost of sparse regression as it requires solving many mixed-integer optimization problems (MIOs) for each hyperparameter combination. To improve upon this state of affairs, we obtain computationally tractable relaxations of k-fold cross-validation metrics, facilitating hyperparameter selection after solving 50-80% fewer MIOs in practice. These relaxations result in an efficient cyclic coordinate descent scheme, achieving 10%-30% lower validation errors than via traditional methods such as grid search with MCP or GLMNet across a suite of 13 real-world datasets.
Related papers
- HAFLQ: Heterogeneous Adaptive Federated LoRA Fine-tuned LLM with Quantization [55.972018549438964]
Federated fine-tuning of pre-trained Large Language Models (LLMs) enables task-specific adaptation across diverse datasets while preserving privacy.<n>We propose HAFLQ (Heterogeneous Adaptive Federated Low-Rank Adaptation Fine-tuned LLM with Quantization), a novel framework for efficient and scalable fine-tuning of LLMs in heterogeneous environments.<n> Experimental results on the text classification task demonstrate that HAFLQ reduces memory usage by 31%, lowers communication cost by 49%, improves accuracy by 50%, and achieves faster convergence compared to the baseline method.
arXiv Detail & Related papers (2024-11-10T19:59:54Z) - Multivariate root-n-consistent smoothing parameter free matching estimators and estimators of inverse density weighted expectations [51.000851088730684]
We develop novel modifications of nearest-neighbor and matching estimators which converge at the parametric $sqrt n $-rate.
We stress that our estimators do not involve nonparametric function estimators and in particular do not rely on sample-size dependent parameters smoothing.
arXiv Detail & Related papers (2024-07-11T13:28:34Z) - ROTI-GCV: Generalized Cross-Validation for right-ROTationally Invariant Data [1.194799054956877]
Two key tasks in high-dimensional regularized regression are tuning the regularization strength for accurate predictions and estimating the out-of-sample risk.
We introduce a new framework, ROTI-GCV, for reliably performing cross-validation under challenging conditions.
arXiv Detail & Related papers (2024-06-17T15:50:00Z) - Gradient-based bilevel optimization for multi-penalty Ridge regression
through matrix differential calculus [0.46040036610482665]
We introduce a gradient-based approach to the problem of linear regression with l2-regularization.
We show that our approach outperforms LASSO, Ridge, and Elastic Net regression.
The analytical of the gradient proves to be more efficient in terms of computational time compared to automatic differentiation.
arXiv Detail & Related papers (2023-11-23T20:03:51Z) - An adaptive shortest-solution guided decimation approach to sparse
high-dimensional linear regression [2.3759847811293766]
ASSD is adapted from the shortest solution-guided algorithm and is referred to as ASSD.
ASSD is especially suitable for linear regression problems with highly correlated measurement matrices encountered in real-world applications.
arXiv Detail & Related papers (2022-11-28T04:29:57Z) - Distributed Estimation and Inference for Semi-parametric Binary Response Models [8.309294338998539]
This paper studies the maximum score estimator of a semi-parametric binary choice model under a distributed computing environment.
An intuitive divide-and-conquer estimator is computationally expensive and restricted by a non-regular constraint on the number of machines.
arXiv Detail & Related papers (2022-10-15T23:06:46Z) - Optimization of Annealed Importance Sampling Hyperparameters [77.34726150561087]
Annealed Importance Sampling (AIS) is a popular algorithm used to estimates the intractable marginal likelihood of deep generative models.
We present a parameteric AIS process with flexible intermediary distributions and optimize the bridging distributions to use fewer number of steps for sampling.
We assess the performance of our optimized AIS for marginal likelihood estimation of deep generative models and compare it to other estimators.
arXiv Detail & Related papers (2022-09-27T07:58:25Z) - Sparse high-dimensional linear regression with a partitioned empirical
Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression.
Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates.
The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z) - Solving weakly supervised regression problem using low-rank manifold
regularization [77.34726150561087]
We solve a weakly supervised regression problem.
Under "weakly" we understand that for some training points the labels are known, for some unknown, and for others uncertain due to the presence of random noise or other reasons such as lack of resources.
In the numerical section, we applied the suggested method to artificial and real datasets using Monte-Carlo modeling.
arXiv Detail & Related papers (2021-04-13T23:21:01Z) - Piecewise linear regression and classification [0.20305676256390928]
This paper proposes a method for solving multivariate regression and classification problems using piecewise linear predictors.
A Python implementation of the algorithm described in this paper is available at http://cse.lab.imtlucca.it/bemporad/parc.
arXiv Detail & Related papers (2021-03-10T17:07:57Z) - Square Root Bundle Adjustment for Large-Scale Reconstruction [56.44094187152862]
We propose a new formulation for the bundle adjustment problem which relies on nullspace marginalization of landmark variables by QR decomposition.
Our approach, which we call square root bundle adjustment, is algebraically equivalent to the commonly used Schur complement trick.
We show in real-world experiments with the BAL datasets that even in single precision the proposed solver achieves on average equally accurate solutions.
arXiv Detail & Related papers (2021-03-02T16:26:20Z) - A Hypergradient Approach to Robust Regression without Correspondence [85.49775273716503]
We consider a variant of regression problem, where the correspondence between input and output data is not available.
Most existing methods are only applicable when the sample size is small.
We propose a new computational framework -- ROBOT -- for the shuffled regression problem.
arXiv Detail & Related papers (2020-11-30T21:47:38Z) - Optimizing Approximate Leave-one-out Cross-validation to Tune
Hyperparameters [0.0]
We derive efficient formulas to compute the hessian gradient and the gradient of ALO.
We demonstrate the usefulness of the proposed approach by finding hyper parameters for regularized logistic regression and ridge regression on various real-world data sets.
arXiv Detail & Related papers (2020-11-20T04:57:41Z) - Doubly Robust Semiparametric Difference-in-Differences Estimators with
High-Dimensional Data [15.27393561231633]
We propose a doubly robust two-stage semiparametric difference-in-difference estimator for estimating heterogeneous treatment effects.
The first stage allows a general set of machine learning methods to be used to estimate the propensity score.
In the second stage, we derive the rates of convergence for both the parametric parameter and the unknown function.
arXiv Detail & Related papers (2020-09-07T15:14:29Z) - Understanding Implicit Regularization in Over-Parameterized Single Index
Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model.
We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z) - Component-wise Adaptive Trimming For Robust Mixture Regression [15.633993488010292]
Existing robust mixture regression methods suffer from outliers as they either conduct outlier estimation in presence of outliers, or rely on prior knowledge of the level of contamination.
Here we propose a fast and efficient robust mixture regression algorithm called adaptive-wise Adaptive Component (CAT) method.
arXiv Detail & Related papers (2020-05-23T20:59:16Z) - Fast cross-validation for multi-penalty ridge regression [0.0]
Ridge regression is a simple model for high-dimensional data.
Our main contribution is a computationally very efficient formula for the multi-penalty, sample-weighted hat-matrix.
Extensions to paired and preferential data types are included and illustrated on several cancer genomics survival prediction problems.
arXiv Detail & Related papers (2020-05-19T09:13:43Z) - Implicit differentiation of Lasso-type models for hyperparameter
optimization [82.73138686390514]
We introduce an efficient implicit differentiation algorithm, without matrix inversion, tailored for Lasso-type problems.
Our approach scales to high-dimensional data by leveraging the sparsity of the solutions.
arXiv Detail & Related papers (2020-02-20T18:43:42Z) - Support recovery and sup-norm convergence rates for sparse pivotal
estimation [79.13844065776928]
In high dimensional sparse regression, pivotal estimators are estimators for which the optimal regularization parameter is independent of the noise level.
We show minimax sup-norm convergence rates for non smoothed and smoothed, single task and multitask square-root Lasso-type estimators.
arXiv Detail & Related papers (2020-01-15T16:11:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.