Related papers: Integration of nested cross-validation, automated hyperparameter optimization, high-performance computing to reduce and quantify the variance of test performance estimation of deep learning models

Integration of nested cross-validation, automated hyperparameter optimization, high-performance computing to reduce and quantify the variance of test performance estimation of deep learning models

URL: http://arxiv.org/abs/2503.08589v1
Date: Tue, 11 Mar 2025 16:25:44 GMT
Title: Integration of nested cross-validation, automated hyperparameter optimization, high-performance computing to reduce and quantify the variance of test performance estimation of deep learning models
Authors: Paul Calle, Averi Bates, Justin C. Reynolds, Yunlong Liu, Haoyang Cui, Sinaro Ly, Chen Wang, Qinghao Zhang, Alberto J. de Armendi, Shashank S. Shettar, Kar Ming Fung, Qinggong Tang, Chongle Pan,
Abstract summary: This study introduces NACHOS to reduce and quantify the variance of test performance metrics of deep learning models.<n> NACHOS integrates NCV and AHPO within a parallelized high-performance computing framework.<n>DACHOS is introduced to leverage AHPO and cross-validation to build the final model on the full dataset.
Score: 2.4901555666568624
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The variability and biases in the real-world performance benchmarking of deep learning models for medical imaging compromise their trustworthiness for real-world deployment. The common approach of holding out a single fixed test set fails to quantify the variance in the estimation of test performance metrics. This study introduces NACHOS (Nested and Automated Cross-validation and Hyperparameter Optimization using Supercomputing) to reduce and quantify the variance of test performance metrics of deep learning models. NACHOS integrates Nested Cross-Validation (NCV) and Automated Hyperparameter Optimization (AHPO) within a parallelized high-performance computing (HPC) framework. NACHOS was demonstrated on a chest X-ray repository and an Optical Coherence Tomography (OCT) dataset under multiple data partitioning schemes. Beyond performance estimation, DACHOS (Deployment with Automated Cross-validation and Hyperparameter Optimization using Supercomputing) is introduced to leverage AHPO and cross-validation to build the final model on the full dataset, improving expected deployment performance. The findings underscore the importance of NCV in quantifying and reducing estimation variance, AHPO in optimizing hyperparameters consistently across test folds, and HPC in ensuring computational feasibility. By integrating these methodologies, NACHOS and DACHOS provide a scalable, reproducible, and trustworthy framework for DL model evaluation and deployment in medical imaging.

Related papers

Benchmarking Few-shot Transferability of Pre-trained Models with Improved Evaluation Protocols [123.73663884421272]
Few-shot transfer has been revolutionized by stronger pre-trained models and improved adaptation algorithms.<n>We establish FEWTRANS, a comprehensive benchmark containing 10 diverse datasets.<n>By releasing FEWTRANS, we aim to provide a rigorous "ruler" to streamline reproducible advances in few-shot transfer learning research.
arXiv Detail & Related papers (2026-02-28T05:41:57Z)
Beyond Outliers: A Study of Optimizers Under Quantization [82.75879062804955]
We study impact of choice on model robustness under quantization.<n>We evaluate how model performance degrades when trained with different baselines.<n>We derive scaling laws for quantization-aware training under different parameters.
arXiv Detail & Related papers (2025-09-27T21:15:22Z)
The role of data partitioning on the performance of EEG-based deep learning models in supervised cross-subject analysis: a preliminary study [37.69303106863453]
Deep learning is advancing the analysis of electroencephalography (EEG) data by effectively discovering highly nonlinear patterns.<n>No comprehensive guidelines for proper data partitioning and cross-validation exist in the domain.<n>This paper thoroughly investigates the role of data partitioning and cross-validation in evaluating EEG deep learning models.
arXiv Detail & Related papers (2025-05-19T12:05:28Z)
Stability Regularized Cross-Validation [5.156484100374059]
We revisit the problem of ensuring strong test-set performance via cross-validation.<n>Motivated by the generalization theory literature, we propose a nested k-fold cross-validation scheme.<n>We benchmark our procedure on a suite of 13 real-world UCI datasets.
arXiv Detail & Related papers (2025-05-11T10:06:59Z)
Leveraging VAE-Derived Latent Spaces for Enhanced Malware Detection with Machine Learning Classifiers [0.0]
This paper assesses the performance of five machine learning classifiers: Decision Tree, Naive Bayes, LightGBM, Logistic Regression, and Random Forest. Results from the experiments conducted on different training-test splits with different random seeds reveal that all the models perform well in detecting malware.
arXiv Detail & Related papers (2025-03-24T14:44:55Z)
Scalable and Effective Negative Sample Generation for Hyperedge Prediction [55.9298019975967]
Hyperedge prediction is crucial for understanding complex multi-entity interactions in web-based applications. Traditional methods often face difficulties in generating high-quality negative samples due to imbalance between positive and negative instances. We present the scalable and effective negative sample generation for Hyperedge Prediction (SEHP) framework, which utilizes diffusion models to tackle these challenges.
arXiv Detail & Related papers (2024-11-19T09:16:25Z)
Revisiting BPR: A Replicability Study of a Common Recommender System Baseline [78.00363373925758]
We study the features of the BPR model, indicating their impact on its performance, and investigate open-source BPR implementations. Our analysis reveals inconsistencies between these implementations and the original BPR paper, leading to a significant decrease in performance of up to 50% for specific implementations. We show that the BPR model can achieve performance levels close to state-of-the-art methods on the top-n recommendation tasks and even outperform them on specific datasets.
arXiv Detail & Related papers (2024-09-21T18:39:53Z)
Predictive Performance Test based on the Exhaustive Nested Cross-Validation for High-dimensional data [7.62566998854384]
Cross-validation is used for several tasks such as estimating the prediction error, tuning the regularization parameter, and selecting the most suitable predictive model. The K-fold cross-validation is a popular CV method but its limitation is that the risk estimates are highly dependent on the partitioning of the data. This study presents an alternative novel predictive performance test and valid confidence intervals based on exhaustive nested cross-validation.
arXiv Detail & Related papers (2024-08-06T12:28:16Z)
Distributionally Robust Optimization as a Scalable Framework to Characterize Extreme Value Distributions [22.765095010254118]
The goal of this paper is to develop distributionally robust optimization (DRO) estimators, specifically for multidimensional Extreme Value Theory (EVT) statistics. In order to mitigate over-conservative estimates while enhancing out-of-sample performance, we study DRO estimators informed by semi-parametric max-stable constraints in the space of point processes. Both approaches are validated using synthetically generated data, recovering prescribed characteristics, and verifying the efficacy of the proposed techniques.
arXiv Detail & Related papers (2024-07-31T19:45:27Z)
Importance of Disjoint Sampling in Conventional and Transformer Models for Hyperspectral Image Classification [2.1223532600703385]
This paper presents an innovative disjoint sampling approach for training SOTA models on Hyperspectral image classification (HSIC) tasks. By separating training, validation, and test data without overlap, the proposed method facilitates a fairer evaluation of how well a model can classify pixels it was not exposed to during training or validation. This rigorous methodology is critical for advancing SOTA models and their real-world application to large-scale land mapping with Hyperspectral sensors.
arXiv Detail & Related papers (2024-04-23T11:40:52Z)
From Static Benchmarks to Adaptive Testing: Psychometrics in AI Evaluation [60.14902811624433]
We discuss a paradigm shift from static evaluation methods to adaptive testing. This involves estimating the characteristics and value of each test item in the benchmark and dynamically adjusting items in real-time. We analyze the current approaches, advantages, and underlying reasons for adopting psychometrics in AI evaluation.
arXiv Detail & Related papers (2023-06-18T09:54:33Z)
BOtied: Multi-objective Bayesian optimization with tied multivariate ranks [33.414682601242006]
In this paper, we show a natural connection between non-dominated solutions and the extreme quantile of the joint cumulative distribution function. Motivated by this link, we propose the Pareto-compliant CDF indicator and the associated acquisition function, BOtied. Our experiments on a variety of synthetic and real-world problems demonstrate that BOtied outperforms state-of-the-art MOBO acquisition functions.
arXiv Detail & Related papers (2023-06-01T04:50:06Z)
A Targeted Accuracy Diagnostic for Variational Approximations [8.969208467611896]
Variational Inference (VI) is an attractive alternative to Markov Chain Monte Carlo (MCMC) Existing methods characterize the quality of the whole variational distribution. We propose the TArgeted Diagnostic for Distribution Approximation Accuracy (TADDAA)
arXiv Detail & Related papers (2023-02-24T02:50:18Z)
Optimization of Annealed Importance Sampling Hyperparameters [77.34726150561087]
Annealed Importance Sampling (AIS) is a popular algorithm used to estimates the intractable marginal likelihood of deep generative models. We present a parameteric AIS process with flexible intermediary distributions and optimize the bridging distributions to use fewer number of steps for sampling. We assess the performance of our optimized AIS for marginal likelihood estimation of deep generative models and compare it to other estimators.
arXiv Detail & Related papers (2022-09-27T07:58:25Z)
HyperImpute: Generalized Iterative Imputation with Automatic Model Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models. We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z)
Fast calculation of Gaussian Process multiple-fold cross-validation residuals and their covariances [0.6091702876917281]
We generalize fast leave-one-out formulae to multiple-fold cross-validation. We highlight the covariance structure of cross-validation residuals in both Simple and Universal Kriging frameworks. Our results enable fast multiple-fold cross-validation and have direct consequences in model diagnostics.
arXiv Detail & Related papers (2021-01-08T17:02:37Z)
An Asymptotically Optimal Multi-Armed Bandit Algorithm and Hyperparameter Optimization [48.5614138038673]
We propose an efficient and robust bandit-based algorithm called Sub-Sampling (SS) in the scenario of hyper parameter search evaluation. We also develop a novel hyper parameter optimization algorithm called BOSS. Empirical studies validate our theoretical arguments of SS and demonstrate the superior performance of BOSS on a number of applications.
arXiv Detail & Related papers (2020-07-11T03:15:21Z)
Machine learning for causal inference: on the use of cross-fit estimators [77.34726150561087]
Doubly-robust cross-fit estimators have been proposed to yield better statistical properties. We conducted a simulation study to assess the performance of several estimators for the average causal effect (ACE) When used with machine learning, the doubly-robust cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage.
arXiv Detail & Related papers (2020-04-21T23:09:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.