Related papers: Toward Theoretical Guidance for Two Common Questions in Practical Cross-Validation based Hyperparameter Selection

Toward Theoretical Guidance for Two Common Questions in Practical Cross-Validation based Hyperparameter Selection

URL: http://arxiv.org/abs/2301.05131v1
Date: Thu, 12 Jan 2023 16:37:12 GMT
Title: Toward Theoretical Guidance for Two Common Questions in Practical Cross-Validation based Hyperparameter Selection
Authors: Parikshit Ram and Alexander G. Gray and Horst C. Samulowitz and Gregory Bramble
Abstract summary: We show the first theoretical treatments of two common questions in cross-validation based hyperparameter selection. We show that these generalizations can, respectively, always perform at least as well as always performing retraining or never performing retraining.
Score: 72.76113104079678
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We show, to our knowledge, the first theoretical treatments of two common questions in cross-validation based hyperparameter selection: (1) After selecting the best hyperparameter using a held-out set, we train the final model using {\em all} of the training data -- since this may or may not improve future generalization error, should one do this? (2) During optimization such as via SGD (stochastic gradient descent), we must set the optimization tolerance $\rho$ -- since it trades off predictive accuracy with computation cost, how should one set it? Toward these problems, we introduce the {\em hold-in risk} (the error due to not using the whole training data), and the {\em model class mis-specification risk} (the error due to having chosen the wrong model class) in a theoretical view which is simple, general, and suggests heuristics that can be used when faced with a dataset instance. In proof-of-concept studies in synthetic data where theoretical quantities can be controlled, we show that these heuristics can, respectively, (1) always perform at least as well as always performing retraining or never performing retraining, (2) either improve performance or reduce computational overhead by $2\times$ with no loss in predictive performance.

Related papers

Maximally-Informative Retrieval for State Space Model Generation [59.954191072042526]
We introduce Retrieval In-Context Optimization (RICO) to minimize model uncertainty for a particular query at test-time.<n>Unlike traditional retrieval-augmented generation (RAG), which relies on externals for document retrieval, our approach leverages direct feedback from the model.<n>We show that standard top-$k$ retrieval with model gradients can approximate our optimization procedure, and provide connections to the leave-one-out loss.
arXiv Detail & Related papers (2025-06-13T18:08:54Z)
You Only Train Once [11.97836331714694]
You Only Train Once (YOTO) contributes to limiting training to one shot for the latter aspect of losses selection and weighting.<n>We leverage the differentiability of the composite loss formulation which is widely used for optimizing multiple empirical losses simultaneously.<n>We show that YOTO consistently outperforms the best grid-search model on unseen test data.
arXiv Detail & Related papers (2025-06-04T18:04:58Z)
Smart Predict-then-Optimize Method with Dependent Data: Risk Bounds and Calibration of Autoregression [7.369846475695131]
We present an autoregressive SPO method directly targeting the optimization problem at the decision stage. We conduct experiments to demonstrate the effectiveness of the SPO+ surrogate compared to the absolute loss and the least squares loss.
arXiv Detail & Related papers (2024-11-19T17:02:04Z)
What Do Learning Dynamics Reveal About Generalization in LLM Reasoning? [83.83230167222852]
We find that a model's generalization behavior can be effectively characterized by a training metric we call pre-memorization train accuracy. By connecting a model's learning behavior to its generalization, pre-memorization train accuracy can guide targeted improvements to training strategies.
arXiv Detail & Related papers (2024-11-12T09:52:40Z)
A Statistical Theory of Regularization-Based Continual Learning [10.899175512941053]
We provide a statistical analysis of regularization-based continual learning on a sequence of linear regression tasks. We first derive the convergence rate for the oracle estimator obtained as if all data were available simultaneously. A byproduct of our theoretical analysis is the equivalence between early stopping and generalized $ell$-regularization.
arXiv Detail & Related papers (2024-06-10T12:25:13Z)
Efficient and Generalizable Certified Unlearning: A Hessian-free Recollection Approach [8.875278412741695]
Machine unlearning strives to uphold the data owners' right to be forgotten by enabling models to selectively forget specific data. We develop an algorithm that achieves near-instantaneous unlearning as it only requires a vector addition operation.
arXiv Detail & Related papers (2024-04-02T07:54:18Z)
HyperImpute: Generalized Iterative Imputation with Automatic Model Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models. We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z)
Model-based Offline Imitation Learning with Non-expert Data [7.615595533111191]
We propose a scalable model-based offline imitation learning algorithmic framework that leverages datasets collected by both suboptimal and optimal policies. We show that the proposed method textitalways outperforms Behavioral Cloning in the low data regime on simulated continuous control domains.
arXiv Detail & Related papers (2022-06-11T13:08:08Z)
Sample-Efficient Optimisation with Probabilistic Transformer Surrogates [66.98962321504085]
This paper investigates the feasibility of employing state-of-the-art probabilistic transformers in Bayesian optimisation. We observe two drawbacks stemming from their training procedure and loss definition, hindering their direct deployment as proxies in black-box optimisation. We introduce two components: 1) a BO-tailored training prior supporting non-uniformly distributed points, and 2) a novel approximate posterior regulariser trading-off accuracy and input sensitivity to filter favourable stationary points for improved predictive performance.
arXiv Detail & Related papers (2022-05-27T11:13:17Z)
Generalization Guarantees for Neural Architecture Search with Train-Validation Split [48.265305046655996]
This paper explores the statistical aspects of such problems with train-validation splits. We show that refined properties of the validation loss such as risk and hyper-gradients are indicative of those of the true test loss. We also highlight rigorous connections between NAS, multiple kernel learning, and low-rank matrix learning.
arXiv Detail & Related papers (2021-04-29T06:11:00Z)
Muddling Labels for Regularization, a novel approach to generalization [0.0]
Generalization is a central problem in Machine Learning. This paper introduces a novel approach to achieve generalization without any data splitting. It is based on a new risk measure which directly quantifies a model's tendency to overfit.
arXiv Detail & Related papers (2021-02-17T14:02:30Z)
Whitening and second order optimization both make information in the dataset unusable during training, and can reduce or prevent generalization [50.53690793828442]
We show that both data whitening and second order optimization can harm or entirely prevent generalization. For a general class of models, namely models with a fully connected first layer, we prove that the information contained in this matrix is the only information which can be used to generalize.
arXiv Detail & Related papers (2020-08-17T18:00:05Z)
Rethinking the Hyperparameters for Fine-tuning [78.15505286781293]
Fine-tuning from pre-trained ImageNet models has become the de-facto standard for various computer vision tasks. Current practices for fine-tuning typically involve selecting an ad-hoc choice of hyper parameters. This paper re-examines several common practices of setting hyper parameters for fine-tuning.
arXiv Detail & Related papers (2020-02-19T18:59:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.