Related papers: Combined Pruning for Nested Cross-Validation to Accelerate Automated Hyperparameter Optimization for Embedded Feature Selection in High-Dimensional Data with Very Small Sample Sizes

Combined Pruning for Nested Cross-Validation to Accelerate Automated Hyperparameter Optimization for Embedded Feature Selection in High-Dimensional Data with Very Small Sample Sizes

URL: http://arxiv.org/abs/2202.00598v1
Date: Tue, 1 Feb 2022 17:42:37 GMT
Title: Combined Pruning for Nested Cross-Validation to Accelerate Automated Hyperparameter Optimization for Embedded Feature Selection in High-Dimensional Data with Very Small Sample Sizes
Authors: Sigrun May, Sven Hartmann and Frank Klawonn
Abstract summary: Tree-based embedded feature selection to exclude irrelevant features in high-dimensional data with very small sample sizes requires optimized hyperparameters for the model building process. Standard pruning algorithms must prune late or risk aborting calculations due to high variance in the performance evaluation metric. We adapt the usage of a state-of-the-art successive halving pruner and combine it with two new pruning strategies based on domain or prior knowledge. Our proposed combined three-layer pruner keeps promising trials while reducing the number of models to be built by up to 81,3% compared to using a state-of-the-
Score: 3.51500332842165
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Applying tree-based embedded feature selection to exclude irrelevant features in high-dimensional data with very small sample sizes requires optimized hyperparameters for the model building process. In addition, nested cross-validation must be applied for this type of data to avoid biased model performance. The resulting long computation time can be accelerated with pruning. However, standard pruning algorithms must prune late or risk aborting calculations of promising hyperparameter sets due to high variance in the performance evaluation metric. To address this, we adapt the usage of a state-of-the-art successive halving pruner and combine it with two new pruning strategies based on domain or prior knowledge. One additional pruning strategy immediately stops the computation of trials with semantically meaningless results for the selected hyperparameter combinations. The other is an extrapolating threshold pruning strategy suitable for nested-cross-validation with high variance. Our proposed combined three-layer pruner keeps promising trials while reducing the number of models to be built by up to 81,3% compared to using a state-of-the-art asynchronous successive halving pruner alone. Our three-layer pruner implementation(available at https://github.com/sigrun-may/cv-pruner) speeds up data analysis or enables deeper hyperparameter search within the same computation time. It consequently saves time, money and energy, reducing the CO2 footprint.

Related papers

Interim Report on Human-Guided Adaptive Hyperparameter Optimization with Multi-Fidelity Sprints [0.0]
This case study applies a phased hyperparameter optimization process to compare multitask natural language model variants.<n>We employ short, Bayesian optimization sessions that leverage multi-fidelity, hyperparameter space pruning, progressive halving, and a degree of human guidance.<n>We demonstrate our method on a collection of variants of the 2021 Joint Entity and Relation Extraction model proposed by Eberts and Ulges.
arXiv Detail & Related papers (2025-05-14T20:38:44Z)
ICE-Pruning: An Iterative Cost-Efficient Pruning Pipeline for Deep Neural Networks [5.107302670511175]
ICE-Pruning is an iterative pruning pipeline for Deep Neural Networks (DNNs)<n>It significantly decreases the time required for pruning by reducing the overall cost of fine-tuning.<n>ICE-Pruning can accelerate pruning by up to 9.61x.
arXiv Detail & Related papers (2025-05-12T10:07:23Z)
ETS: Efficient Tree Search for Inference-Time Scaling [61.553681244572914]
One promising approach for test-time compute scaling is search against a process reward model. diversity of trajectories in the tree search process affects the accuracy of the search, since increasing diversity promotes more exploration. We propose Efficient Tree Search (ETS), which promotes KV sharing by pruning redundant trajectories while maintaining necessary diverse trajectories.
arXiv Detail & Related papers (2025-02-19T09:30:38Z)
Value-Based Deep RL Scales Predictably [100.21834069400023]
We show that value-based off-policy RL methods are predictable despite community lore regarding their pathological behavior. We validate our approach using three algorithms: SAC, BRO, and PQL on DeepMind Control, OpenAI gym, and IsaacGym.
arXiv Detail & Related papers (2025-02-06T18:59:47Z)
ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts. Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
arXiv Detail & Related papers (2024-12-11T12:31:30Z)
TT-MPD: Test Time Model Pruning and Distillation [3.675015670568961]
Pruning can be an effective method of compressing large pre-trained models for inference speed acceleration. Previous pruning approaches rely on access to the original training dataset for both pruning and subsequent fine-tuning. We propose an efficient pruning method that considers the approximated finetuned accuracy and potential inference latency saving.
arXiv Detail & Related papers (2024-12-10T02:05:13Z)
Interpretable label-free self-guided subspace clustering [0.0]
Majority subspace clustering (SC) algorithms depend on one or more hyperparameters that need to be carefully tuned for the SC algorithms to achieve high clustering performance. We propose a novel approach to label-independent HPO that uses clustering quality metrics, such as accuracy (ACC) or normalized mutual information (NMI) We demonstrate this approach on several single- and multi-view SC algorithms, comparing the achieved performance with their oracle versions across six datasets representing digits, faces and objects.
arXiv Detail & Related papers (2024-11-26T10:29:09Z)
Stability-Adjusted Cross-Validation for Sparse Linear Regression [5.156484100374059]
Cross-validation techniques like k-fold cross-validation substantially increase the computational cost of sparse regression. We propose selecting hyper parameters that minimize a weighted sum of a cross-validation metric and a model's output stability. Our confidence adjustment procedure reduces test set error by 2%, on average, on 13 real-world datasets.
arXiv Detail & Related papers (2023-06-26T17:02:45Z)
Tune As You Scale: Hyperparameter Optimization For Compute Efficient Training [0.0]
We propose a practical method for robustly tuning large models. CarBS performs local search around the performance-cost frontier. Among our results, we effectively solve the entire ProcGen benchmark just by tuning a simple baseline.
arXiv Detail & Related papers (2023-06-13T18:22:24Z)
Parameter-efficient Tuning of Large-scale Multimodal Foundation Model [68.24510810095802]
We propose A graceful prompt framework for cross-modal transfer (Aurora) to overcome these challenges. Considering the redundancy in existing architectures, we first utilize the mode approximation to generate 0.1M trainable parameters to implement the multimodal prompt tuning. A thorough evaluation on six cross-modal benchmarks shows that it not only outperforms the state-of-the-art but even outperforms the full fine-tuning approach.
arXiv Detail & Related papers (2023-05-15T06:40:56Z)
Fast Hyperparameter Tuning for Ising Machines [0.8057006406834467]
"FastConvergence" is a convergence acceleration method for Tree-structured Parzen Estimator (TPE) For experiments, well-known Travel Salesman Problem (TSP) and Quadratic Assignment Problem (QAP) instances are used as input. Results show, FastConvergence can reach similar results to TPE alone within less than half the number of trials.
arXiv Detail & Related papers (2022-11-29T01:53:31Z)
Parameter-Efficient Sparsity for Large Language Models Fine-Tuning [63.321205487234074]
We propose a. sparse-efficient Sparse Training (PST) method to reduce the number of trainable parameters during sparse-aware training. Experiments with diverse networks (i.e., BERT, RoBERTa and GPT-2) demonstrate PST performs on par or better than previous sparsity methods.
arXiv Detail & Related papers (2022-05-23T02:43:45Z)
STORM+: Fully Adaptive SGD with Momentum for Nonconvex Optimization [74.1615979057429]
We investigate non-batch optimization problems where the objective is an expectation over smooth loss functions. Our work builds on the STORM algorithm, in conjunction with a novel approach to adaptively set the learning rate and momentum parameters.
arXiv Detail & Related papers (2021-11-01T15:43:36Z)
MLPruning: A Multilevel Structured Pruning Framework for Transformer-based Models [78.45898846056303]
Pruning is an effective method to reduce the memory footprint and computational cost associated with large natural language processing models. We develop a novel MultiLevel structured Pruning framework, which uses three different levels of structured pruning: head pruning, row pruning, and block-wise sparse pruning.
arXiv Detail & Related papers (2021-05-30T22:00:44Z)
DyCo3D: Robust Instance Segmentation of 3D Point Clouds through Dynamic Convolution [136.7261709896713]
We propose a data-driven approach that generates the appropriate convolution kernels to apply in response to the nature of the instances. The proposed method achieves promising results on both ScanetNetV2 and S3DIS. It also improves inference speed by more than 25% over the current state-of-the-art.
arXiv Detail & Related papers (2020-11-26T14:56:57Z)
Accelerate CNNs from Three Dimensions: A Comprehensive Pruning Framework [40.635566748735386]
Most neural network pruning methods prune the network model along one (depth, width, or resolution) solely to meet a computational budget. We argue that pruning should be conducted along three dimensions comprehensively. Our proposed algorithm surpasses state-of-the-art pruning algorithms and even neural architecture search-based algorithms.
arXiv Detail & Related papers (2020-10-10T02:30:47Z)
Support recovery and sup-norm convergence rates for sparse pivotal estimation [79.13844065776928]
In high dimensional sparse regression, pivotal estimators are estimators for which the optimal regularization parameter is independent of the noise level. We show minimax sup-norm convergence rates for non smoothed and smoothed, single task and multitask square-root Lasso-type estimators.
arXiv Detail & Related papers (2020-01-15T16:11:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.