Related papers: Optimizing Approximate Leave-one-out Cross-validation to Tune Hyperparameters

Optimizing Approximate Leave-one-out Cross-validation to Tune Hyperparameters

URL: http://arxiv.org/abs/2011.10218v1
Date: Fri, 20 Nov 2020 04:57:41 GMT
Title: Optimizing Approximate Leave-one-out Cross-validation to Tune Hyperparameters
Authors: Ryan Burn
Abstract summary: We derive efficient formulas to compute the hessian gradient and the gradient of ALO. We demonstrate the usefulness of the proposed approach by finding hyper parameters for regularized logistic regression and ridge regression on various real-world data sets.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: For a large class of regularized models, leave-one-out cross-validation can be efficiently estimated with an approximate leave-one-out formula (ALO). We consider the problem of adjusting hyperparameters so as to optimize ALO. We derive efficient formulas to compute the gradient and hessian of ALO and show how to apply a second-order optimizer to find hyperparameters. We demonstrate the usefulness of the proposed approach by finding hyperparameters for regularized logistic regression and ridge regression on various real-world data sets.

Related papers

Interim Report on Human-Guided Adaptive Hyperparameter Optimization with Multi-Fidelity Sprints [0.0]
This case study applies a phased hyperparameter optimization process to compare multitask natural language model variants.<n>We employ short, Bayesian optimization sessions that leverage multi-fidelity, hyperparameter space pruning, progressive halving, and a degree of human guidance.<n>We demonstrate our method on a collection of variants of the 2021 Joint Entity and Relation Extraction model proposed by Eberts and Ulges.
arXiv Detail & Related papers (2025-05-14T20:38:44Z)
Adaptive Preference Scaling for Reinforcement Learning with Human Feedback [103.36048042664768]
Reinforcement learning from human feedback (RLHF) is a prevalent approach to align AI systems with human values. We propose a novel adaptive preference loss, underpinned by distributionally robust optimization (DRO) Our method is versatile and can be readily adapted to various preference optimization frameworks.
arXiv Detail & Related papers (2024-06-04T20:33:22Z)
Gradient-based bilevel optimization for multi-penalty Ridge regression through matrix differential calculus [0.46040036610482665]
We introduce a gradient-based approach to the problem of linear regression with l2-regularization. We show that our approach outperforms LASSO, Ridge, and Elastic Net regression. The analytical of the gradient proves to be more efficient in terms of computational time compared to automatic differentiation.
arXiv Detail & Related papers (2023-11-23T20:03:51Z)
Stability-Adjusted Cross-Validation for Sparse Linear Regression [5.156484100374059]
Cross-validation techniques like k-fold cross-validation substantially increase the computational cost of sparse regression. We propose selecting hyper parameters that minimize a weighted sum of a cross-validation metric and a model's output stability. Our confidence adjustment procedure reduces test set error by 2%, on average, on 13 real-world datasets.
arXiv Detail & Related papers (2023-06-26T17:02:45Z)
Optimization of Annealed Importance Sampling Hyperparameters [77.34726150561087]
Annealed Importance Sampling (AIS) is a popular algorithm used to estimates the intractable marginal likelihood of deep generative models. We present a parameteric AIS process with flexible intermediary distributions and optimize the bridging distributions to use fewer number of steps for sampling. We assess the performance of our optimized AIS for marginal likelihood estimation of deep generative models and compare it to other estimators.
arXiv Detail & Related papers (2022-09-27T07:58:25Z)
Sparse high-dimensional linear regression with a partitioned empirical Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression. Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates. The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z)
An Adaptive Alternating-direction-method-based Nonnegative Latent Factor Model [2.857044909410376]
An alternating-direction-method-based nonnegative latent factor model can perform efficient representation learning to a high-dimensional and incomplete (HDI) matrix. This paper proposes an Adaptive Alternating-direction-method-based Nonnegative Latent Factor model, whose hyper- parameter adaptation is implemented following the principle of particle swarm optimization. Empirical studies on nonnegative HDI matrices generated by industrial applications indicate that A2NLF outperforms several state-of-the-art models in terms of computational and storage efficiency, as well as maintains highly competitive estimation accuracy for an HDI matrix's missing data
arXiv Detail & Related papers (2022-04-11T03:04:26Z)
AUTOMATA: Gradient Based Data Subset Selection for Compute-Efficient Hyper-parameter Tuning [72.54359545547904]
We propose a gradient-based subset selection framework for hyper- parameter tuning. We show that using gradient-based data subsets for hyper- parameter tuning achieves significantly faster turnaround times and speedups of 3$times$-30$times$.
arXiv Detail & Related papers (2022-03-15T19:25:01Z)
Optimizing Large-Scale Hyperparameters via Automated Learning Algorithm [97.66038345864095]
We propose a new hyperparameter optimization method with zeroth-order hyper-gradients (HOZOG) Specifically, we first formulate hyperparameter optimization as an A-based constrained optimization problem. Then, we use the average zeroth-order hyper-gradients to update hyper parameters.
arXiv Detail & Related papers (2021-02-17T21:03:05Z)
Hyper-parameter estimation method with particle swarm optimization [0.8883733362171032]
The PSO method cannot be directly used in the problem of hyper- parameters estimation. The proposed method uses the swarm method to optimize the performance of the acquisition function. The results on several problems are improved.
arXiv Detail & Related papers (2020-11-24T07:51:51Z)
Online Hyperparameter Search Interleaved with Proximal Parameter Updates [9.543667840503739]
We develop a method that relies on the structure of proximal gradient methods and does not require a smooth cost function. Such a method is applied to Leave-one-out (LOO)-validated Lasso and Group Lasso. Numerical experiments corroborate the convergence of the proposed method to a local optimum of the LOO validation error curve.
arXiv Detail & Related papers (2020-04-06T15:54:03Z)
Implicit differentiation of Lasso-type models for hyperparameter optimization [82.73138686390514]
We introduce an efficient implicit differentiation algorithm, without matrix inversion, tailored for Lasso-type problems. Our approach scales to high-dimensional data by leveraging the sparsity of the solutions.
arXiv Detail & Related papers (2020-02-20T18:43:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.