Toward Theoretical Guidance for Two Common Questions in Practical
Cross-Validation based Hyperparameter Selection
- URL: http://arxiv.org/abs/2301.05131v1
- Date: Thu, 12 Jan 2023 16:37:12 GMT
- Title: Toward Theoretical Guidance for Two Common Questions in Practical
Cross-Validation based Hyperparameter Selection
- Authors: Parikshit Ram and Alexander G. Gray and Horst C. Samulowitz and
Gregory Bramble
- Abstract summary: We show the first theoretical treatments of two common questions in cross-validation based hyperparameter selection.
We show that these generalizations can, respectively, always perform at least as well as always performing retraining or never performing retraining.
- Score: 72.76113104079678
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We show, to our knowledge, the first theoretical treatments of two common
questions in cross-validation based hyperparameter selection: (1) After
selecting the best hyperparameter using a held-out set, we train the final
model using {\em all} of the training data -- since this may or may not improve
future generalization error, should one do this? (2) During optimization such
as via SGD (stochastic gradient descent), we must set the optimization
tolerance $\rho$ -- since it trades off predictive accuracy with computation
cost, how should one set it? Toward these problems, we introduce the {\em
hold-in risk} (the error due to not using the whole training data), and the
{\em model class mis-specification risk} (the error due to having chosen the
wrong model class) in a theoretical view which is simple, general, and suggests
heuristics that can be used when faced with a dataset instance. In
proof-of-concept studies in synthetic data where theoretical quantities can be
controlled, we show that these heuristics can, respectively, (1) always perform
at least as well as always performing retraining or never performing
retraining, (2) either improve performance or reduce computational overhead by
$2\times$ with no loss in predictive performance.
Related papers
- Smart Predict-then-Optimize Method with Dependent Data: Risk Bounds and Calibration of Autoregression [7.369846475695131]
We present an autoregressive SPO method directly targeting the optimization problem at the decision stage.
We conduct experiments to demonstrate the effectiveness of the SPO+ surrogate compared to the absolute loss and the least squares loss.
arXiv Detail & Related papers (2024-11-19T17:02:04Z) - What Do Learning Dynamics Reveal About Generalization in LLM Reasoning? [83.83230167222852]
We find that a model's generalization behavior can be effectively characterized by a training metric we call pre-memorization train accuracy.
By connecting a model's learning behavior to its generalization, pre-memorization train accuracy can guide targeted improvements to training strategies.
arXiv Detail & Related papers (2024-11-12T09:52:40Z) - A Statistical Theory of Regularization-Based Continual Learning [10.899175512941053]
We provide a statistical analysis of regularization-based continual learning on a sequence of linear regression tasks.
We first derive the convergence rate for the oracle estimator obtained as if all data were available simultaneously.
A byproduct of our theoretical analysis is the equivalence between early stopping and generalized $ell$-regularization.
arXiv Detail & Related papers (2024-06-10T12:25:13Z) - Efficient and Generalizable Certified Unlearning: A Hessian-free Recollection Approach [8.875278412741695]
Machine unlearning strives to uphold the data owners' right to be forgotten by enabling models to selectively forget specific data.
We develop an algorithm that achieves near-instantaneous unlearning as it only requires a vector addition operation.
arXiv Detail & Related papers (2024-04-02T07:54:18Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - Model-based Offline Imitation Learning with Non-expert Data [7.615595533111191]
We propose a scalable model-based offline imitation learning algorithmic framework that leverages datasets collected by both suboptimal and optimal policies.
We show that the proposed method textitalways outperforms Behavioral Cloning in the low data regime on simulated continuous control domains.
arXiv Detail & Related papers (2022-06-11T13:08:08Z) - Sample-Efficient Optimisation with Probabilistic Transformer Surrogates [66.98962321504085]
This paper investigates the feasibility of employing state-of-the-art probabilistic transformers in Bayesian optimisation.
We observe two drawbacks stemming from their training procedure and loss definition, hindering their direct deployment as proxies in black-box optimisation.
We introduce two components: 1) a BO-tailored training prior supporting non-uniformly distributed points, and 2) a novel approximate posterior regulariser trading-off accuracy and input sensitivity to filter favourable stationary points for improved predictive performance.
arXiv Detail & Related papers (2022-05-27T11:13:17Z) - Muddling Labels for Regularization, a novel approach to generalization [0.0]
Generalization is a central problem in Machine Learning.
This paper introduces a novel approach to achieve generalization without any data splitting.
It is based on a new risk measure which directly quantifies a model's tendency to overfit.
arXiv Detail & Related papers (2021-02-17T14:02:30Z) - Whitening and second order optimization both make information in the
dataset unusable during training, and can reduce or prevent generalization [50.53690793828442]
We show that both data whitening and second order optimization can harm or entirely prevent generalization.
For a general class of models, namely models with a fully connected first layer, we prove that the information contained in this matrix is the only information which can be used to generalize.
arXiv Detail & Related papers (2020-08-17T18:00:05Z) - Rethinking the Hyperparameters for Fine-tuning [78.15505286781293]
Fine-tuning from pre-trained ImageNet models has become the de-facto standard for various computer vision tasks.
Current practices for fine-tuning typically involve selecting an ad-hoc choice of hyper parameters.
This paper re-examines several common practices of setting hyper parameters for fine-tuning.
arXiv Detail & Related papers (2020-02-19T18:59:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.