Blocked Cross-Validation: A Precise and Efficient Method for
Hyperparameter Tuning
- URL: http://arxiv.org/abs/2306.06591v2
- Date: Mon, 31 Jul 2023 15:03:25 GMT
- Title: Blocked Cross-Validation: A Precise and Efficient Method for
Hyperparameter Tuning
- Authors: Giovanni Maria Merola
- Abstract summary: We introduce a novel approach called blocked cross-validation (BCV), where the repetitions are blocked with respect to both CV partition and the random behavior of the learner.
BCV provides more precise error estimates compared to RCV, even with a significantly reduced number of runs.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Hyperparameter tuning plays a crucial role in optimizing the performance of
predictive learners. Cross--validation (CV) is a widely adopted technique for
estimating the error of different hyperparameter settings. Repeated
cross-validation (RCV) has been commonly employed to reduce the variability of
CV errors. In this paper, we introduce a novel approach called blocked
cross-validation (BCV), where the repetitions are blocked with respect to both
CV partition and the random behavior of the learner. Theoretical analysis and
empirical experiments demonstrate that BCV provides more precise error
estimates compared to RCV, even with a significantly reduced number of runs. We
present extensive examples using real--world data sets to showcase the
effectiveness and efficiency of BCV in hyperparameter tuning. Our results
indicate that BCV outperforms RCV in hyperparameter tuning, achieving greater
precision with fewer computations.
Related papers
- Predictive Performance Test based on the Exhaustive Nested Cross-Validation for High-dimensional data [7.62566998854384]
Cross-validation is used for several tasks such as estimating the prediction error, tuning the regularization parameter, and selecting the most suitable predictive model.
The K-fold cross-validation is a popular CV method but its limitation is that the risk estimates are highly dependent on the partitioning of the data.
This study presents an alternative novel predictive performance test and valid confidence intervals based on exhaustive nested cross-validation.
arXiv Detail & Related papers (2024-08-06T12:28:16Z) - Is K-fold cross validation the best model selection method for Machine
Learning? [0.0]
K-fold cross-validation is the most common approach to ascertaining the likelihood that a machine learning outcome is generated by chance.
A novel test based on K-fold CV and the Upper Bound of the actual error (K-fold CUBV) is composed.
arXiv Detail & Related papers (2024-01-29T18:46:53Z) - Approximated Prompt Tuning for Vision-Language Pre-trained Models [54.326232586461614]
In vision-language pre-trained models, prompt tuning often requires a large number of learnable tokens to bridge the gap between the pre-training and downstream tasks.
We propose a novel Approximated Prompt Tuning (APT) approach towards efficient VL transfer learning.
arXiv Detail & Related papers (2023-06-27T05:43:47Z) - An Empirical Analysis of Parameter-Efficient Methods for Debiasing
Pre-Trained Language Models [55.14405248920852]
We conduct experiments with prefix tuning, prompt tuning, and adapter tuning on different language models and bias types to evaluate their debiasing performance.
We find that the parameter-efficient methods are effective in mitigating gender bias, where adapter tuning is consistently the most effective.
We also find that prompt tuning is more suitable for GPT-2 than BERT, and racial and religious bias is less effective when it comes to racial and religious bias.
arXiv Detail & Related papers (2023-06-06T23:56:18Z) - Confidence intervals for the Cox model test error from cross-validation [91.3755431537592]
Cross-validation (CV) is one of the most widely used techniques in statistical learning for estimating the test error of a model.
Standard confidence intervals for test error using estimates from CV may have coverage below nominal levels.
One way to this issue is by estimating the mean squared error of the prediction error instead using nested CV.
arXiv Detail & Related papers (2022-01-26T06:40:43Z) - Assessment of Treatment Effect Estimators for Heavy-Tailed Data [70.72363097550483]
A central obstacle in the objective assessment of treatment effect (TE) estimators in randomized control trials (RCTs) is the lack of ground truth (or validation set) to test their performance.
We provide a novel cross-validation-like methodology to address this challenge.
We evaluate our methodology across 709 RCTs implemented in the Amazon supply chain.
arXiv Detail & Related papers (2021-12-14T17:53:01Z) - Causal Effect Variational Autoencoder with Uniform Treatment [50.895390968371665]
Causal effect variational autoencoder (CEVAE) are trained to predict the outcome given observational treatment data.
Uniform treatment variational autoencoders (UTVAE) are trained with uniform treatment distribution using importance sampling.
arXiv Detail & Related papers (2021-11-16T17:40:57Z) - Overfitting in Bayesian Optimization: an empirical study and
early-stopping solution [41.782410830989136]
We propose the first problem-adaptive and interpretable criterion to early stop BO.
We show that our approach can substantially reduce compute time with little to no loss of test accuracy.
arXiv Detail & Related papers (2021-04-16T15:26:23Z) - Estimating Average Treatment Effects with Support Vector Machines [77.34726150561087]
Support vector machine (SVM) is one of the most popular classification algorithms in the machine learning literature.
We adapt SVM as a kernel-based weighting procedure that minimizes the maximum mean discrepancy between the treatment and control groups.
We characterize the bias of causal effect estimation arising from this trade-off, connecting the proposed SVM procedure to the existing kernel balancing methods.
arXiv Detail & Related papers (2021-02-23T20:22:56Z) - When to Impute? Imputation before and during cross-validation [0.0]
Cross-validation (CV) is a technique used to estimate generalization error for prediction models.
It has been recommended the entire sequence of steps be carried out during each replicate of CV to mimic the application of the entire pipeline to an external testing set.
arXiv Detail & Related papers (2020-10-01T23:04:16Z) - Approximate Cross-Validation with Low-Rank Data in High Dimensions [35.74302895575951]
Cross-validation is an important tool for model assessment.
ACV methods can lose both speed and accuracy in high dimensions unless sparsity structure is present in the data.
We develop a new algorithm for ACV that is fast and accurate in the presence of ALR data.
arXiv Detail & Related papers (2020-08-24T16:34:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.