Related papers: Efficient Hyperparameter Importance Assessment for CNNs

Efficient Hyperparameter Importance Assessment for CNNs

URL: http://arxiv.org/abs/2410.08920v1
Date: Fri, 11 Oct 2024 15:47:46 GMT
Title: Efficient Hyperparameter Importance Assessment for CNNs
Authors: Ruinan Wang, Ian Nabney, Mohammad Golbabaee,
Abstract summary: This paper aims to quantify the importance weights of some hyperparameters in Convolutional Neural Networks (CNNs) with an algorithm called N-RReliefF. We conduct an extensive study by training over ten thousand CNN models across ten popular image classification datasets.
Score: 1.7778609937758323
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Hyperparameter selection is an essential aspect of the machine learning pipeline, profoundly impacting models' robustness, stability, and generalization capabilities. Given the complex hyperparameter spaces associated with Neural Networks and the constraints of computational resources and time, optimizing all hyperparameters becomes impractical. In this context, leveraging hyperparameter importance assessment (HIA) can provide valuable guidance by narrowing down the search space. This enables machine learning practitioners to focus their optimization efforts on the hyperparameters with the most significant impact on model performance while conserving time and resources. This paper aims to quantify the importance weights of some hyperparameters in Convolutional Neural Networks (CNNs) with an algorithm called N-RReliefF, laying the groundwork for applying HIA methodologies in the Deep Learning field. We conduct an extensive study by training over ten thousand CNN models across ten popular image classification datasets, thereby acquiring a comprehensive dataset containing hyperparameter configuration instances and their corresponding performance metrics. It is demonstrated that among the investigated hyperparameters, the top five important hyperparameters of the CNN model are the number of convolutional layers, learning rate, dropout rate, optimizer and epoch.

Related papers

Predictable Scale: Part I -- Optimal Hyperparameter Scaling Law in Large Language Model Pretraining [56.58170370127227]
We show that optimal learning rate follows a power-law relationship with both model parameters and data sizes, while optimal batch size scales primarily with data sizes. This work is the first work that unifies different model shapes and structures, such as Mixture-of-Experts models and dense transformers.
arXiv Detail & Related papers (2025-03-06T18:58:29Z)
Sample complexity of data-driven tuning of model hyperparameters in neural networks with structured parameter-dependent dual function [24.457000214575245]
We introduce a new technique to characterize the discontinuities and oscillations of the utility function on any fixed problem instance. This can be used to show that the learning theoretic complexity of the corresponding family of utility functions is bounded.
arXiv Detail & Related papers (2025-01-23T15:10:51Z)
Optimization of Actuarial Neural Networks with Response Surface Methodology [0.0]
This study utilizes a factorial design and response surface methodology (RSM) to optimize CANN performance. By dropping statistically insignificant hyper parameters, we reduced runs from 288 to 188, with negligible loss in accuracy, achieving near-optimal out-of-sample Poisson deviance loss.
arXiv Detail & Related papers (2024-10-01T15:45:41Z)
Optimization Hyper-parameter Laws for Large Language Models [56.322914260197734]
We present Opt-Laws, a framework that captures the relationship between hyper- parameters and training outcomes. Our validation across diverse model sizes and data scales demonstrates Opt-Laws' ability to accurately predict training loss. This approach significantly reduces computational costs while enhancing overall model performance.
arXiv Detail & Related papers (2024-09-07T09:37:19Z)
Goal-Oriented Sensitivity Analysis of Hyperparameters in Deep Learning [0.0]
We study the use of goal-oriented sensitivity analysis, based on the Hilbert-Schmidt Independence Criterion (HSIC), for hyperparameter analysis and optimization. We derive an HSIC-based optimization algorithm that we apply on MNIST and Cifar, classical machine learning data sets, of interest for scientific machine learning.
arXiv Detail & Related papers (2022-07-13T14:21:12Z)
AUTOMATA: Gradient Based Data Subset Selection for Compute-Efficient Hyper-parameter Tuning [72.54359545547904]
We propose a gradient-based subset selection framework for hyper- parameter tuning. We show that using gradient-based data subsets for hyper- parameter tuning achieves significantly faster turnaround times and speedups of 3$times$-30$times$.
arXiv Detail & Related papers (2022-03-15T19:25:01Z)
To tune or not to tune? An Approach for Recommending Important Hyperparameters [2.121963121603413]
We consider building the relationship between the performance of the machine learning models and their hyperparameters to discover the trend and gain insights. Our results enable users to decide whether it is worth conducting a possibly time-consuming tuning strategy.
arXiv Detail & Related papers (2021-08-30T08:54:58Z)
HyperNP: Interactive Visual Exploration of Multidimensional Projection Hyperparameters [61.354362652006834]
HyperNP is a scalable method that allows for real-time interactive exploration of projection methods by training neural network approximations. We evaluate the performance of the HyperNP across three datasets in terms of performance and speed.
arXiv Detail & Related papers (2021-06-25T17:28:14Z)
On the Importance of Hyperparameter Optimization for Model-based Reinforcement Learning [27.36718899899319]
Model-based Reinforcement Learning (MBRL) is a promising framework for learning control in a data-efficient manner. MBRL typically requires significant human expertise before it can be applied to new problems and domains.
arXiv Detail & Related papers (2021-02-26T18:57:47Z)
Online hyperparameter optimization by real-time recurrent learning [57.01871583756586]
Our framework takes advantage of the analogy between hyperparameter optimization and parameter learning in neural networks (RNNs) It adapts a well-studied family of online learning algorithms for RNNs to tune hyperparameters and network parameters simultaneously. This procedure yields systematically better generalization performance compared to standard methods, at a fraction of wallclock time.
arXiv Detail & Related papers (2021-02-15T19:36:18Z)
On the Sparsity of Neural Machine Translation Models [65.49762428553345]
We investigate whether redundant parameters can be reused to achieve better performance. Experiments and analyses are systematically conducted on different datasets and NMT architectures.
arXiv Detail & Related papers (2020-10-06T11:47:20Z)
An Asymptotically Optimal Multi-Armed Bandit Algorithm and Hyperparameter Optimization [48.5614138038673]
We propose an efficient and robust bandit-based algorithm called Sub-Sampling (SS) in the scenario of hyper parameter search evaluation. We also develop a novel hyper parameter optimization algorithm called BOSS. Empirical studies validate our theoretical arguments of SS and demonstrate the superior performance of BOSS on a number of applications.
arXiv Detail & Related papers (2020-07-11T03:15:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.