Model Performance Prediction for Hyperparameter Optimization of Deep
Learning Models Using High Performance Computing and Quantum Annealing
- URL: http://arxiv.org/abs/2311.17508v1
- Date: Wed, 29 Nov 2023 10:32:40 GMT
- Title: Model Performance Prediction for Hyperparameter Optimization of Deep
Learning Models Using High Performance Computing and Quantum Annealing
- Authors: Juan Pablo Garc\'ia Amboage, Eric Wulff, Maria Girone, Tom\'as F. Pena
- Abstract summary: We show that integrating model performance prediction with early stopping methods holds great potential to speed up the HPO process of deep learning models.
We propose a novel algorithm called Swift-Hyperband that can use either classical or quantum support vector regression for performance prediction.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Hyperparameter Optimization (HPO) of Deep Learning-based models tends to be a
compute resource intensive process as it usually requires to train the target
model with many different hyperparameter configurations. We show that
integrating model performance prediction with early stopping methods holds
great potential to speed up the HPO process of deep learning models. Moreover,
we propose a novel algorithm called Swift-Hyperband that can use either
classical or quantum support vector regression for performance prediction and
benefit from distributed High Performance Computing environments. This
algorithm is tested not only for the Machine-Learned Particle Flow model used
in High Energy Physics, but also for a wider range of target models from
domains such as computer vision and natural language processing.
Swift-Hyperband is shown to find comparable (or better) hyperparameters as well
as using less computational resources in all test cases.
Related papers
- LESA: Learnable LLM Layer Scaling-Up [57.0510934286449]
Training Large Language Models (LLMs) from scratch requires immense computational resources, making it prohibitively expensive.
Model scaling-up offers a promising solution by leveraging the parameters of smaller models to create larger ones.
We propose textbfLESA, a novel learnable method for depth scaling-up.
arXiv Detail & Related papers (2025-02-19T14:58:48Z) - HyperQ-Opt: Q-learning for Hyperparameter Optimization [0.0]
This paper presents a novel perspective on HPO by formulating it as a sequential decision-making problem and leveraging Q-learning, a reinforcement learning technique.
The approaches are evaluated for their ability to find optimal or near-optimal configurations within a limited number of trials.
By shifting the paradigm toward policy-based optimization, this work contributes to advancing HPO methods for scalable and efficient machine learning applications.
arXiv Detail & Related papers (2024-12-23T18:22:34Z) - Optimization Hyper-parameter Laws for Large Language Models [52.49860340549727]
We present Opt-Laws, a framework that captures the relationship between hyper- parameters and training outcomes.
Our validation across diverse model sizes and data scales demonstrates Opt-Laws' ability to accurately predict training loss.
This approach significantly reduces computational costs while enhancing overall model performance.
arXiv Detail & Related papers (2024-09-07T09:37:19Z) - Streamlining Ocean Dynamics Modeling with Fourier Neural Operators: A Multiobjective Hyperparameter and Architecture Optimization Approach [5.232806761554172]
We use the advanced search algorithms for multiobjective optimization in DeepHyper to streamline the development of neural networks tailored for ocean modeling.
We demonstrate an approach to enhance the use of FNOs in ocean dynamics forecasting, offering a scalable solution with improved precision.
arXiv Detail & Related papers (2024-04-07T14:29:23Z) - Hyperparameter optimization, quantum-assisted model performance
prediction, and benchmarking of AI-based High Energy Physics workloads using
HPC [0.0]
This work studies the potential of using model performance prediction to aid the HPO process carried out on High Performance Computing systems.
A quantum annealer is used to train the performance predictor and a method is proposed to overcome some of the problems derived from the current limitations in quantum systems.
Results are presented from the development of a containerized benchmark based on an AI-model for collision event reconstruction.
arXiv Detail & Related papers (2023-03-27T09:55:33Z) - Scaling Pre-trained Language Models to Deeper via Parameter-efficient
Architecture [68.13678918660872]
We design a more capable parameter-sharing architecture based on matrix product operator (MPO)
MPO decomposition can reorganize and factorize the information of a parameter matrix into two parts.
Our architecture shares the central tensor across all layers for reducing the model size.
arXiv Detail & Related papers (2023-03-27T02:34:09Z) - Two-step hyperparameter optimization method: Accelerating hyperparameter
search by using a fraction of a training dataset [0.15420205433587747]
We present a two-step HPO method as a strategic solution to curbing computational demands and wait times.
We present our recent application of the two-step HPO method to the development of neural network emulators for aerosol activation.
arXiv Detail & Related papers (2023-02-08T02:38:26Z) - Towards Learning Universal Hyperparameter Optimizers with Transformers [57.35920571605559]
We introduce the OptFormer, the first text-based Transformer HPO framework that provides a universal end-to-end interface for jointly learning policy and function prediction.
Our experiments demonstrate that the OptFormer can imitate at least 7 different HPO algorithms, which can be further improved via its function uncertainty estimates.
arXiv Detail & Related papers (2022-05-26T12:51:32Z) - AUTOMATA: Gradient Based Data Subset Selection for Compute-Efficient
Hyper-parameter Tuning [72.54359545547904]
We propose a gradient-based subset selection framework for hyper- parameter tuning.
We show that using gradient-based data subsets for hyper- parameter tuning achieves significantly faster turnaround times and speedups of 3$times$-30$times$.
arXiv Detail & Related papers (2022-03-15T19:25:01Z) - Towards Robust and Automatic Hyper-Parameter Tunning [39.04604349338802]
We introduce a new class of HPO method and explore how the low-rank factorization of intermediate layers of a convolutional network can be used to define an analytical response surface.
We quantify how this surface behaves as a surrogate to model performance and can be solved using a trust-region search algorithm, which we call autoHyper.
arXiv Detail & Related papers (2021-11-28T05:27:34Z) - MoEfication: Conditional Computation of Transformer Models for Efficient
Inference [66.56994436947441]
Transformer-based pre-trained language models can achieve superior performance on most NLP tasks due to large parameter capacity, but also lead to huge computation cost.
We explore to accelerate large-model inference by conditional computation based on the sparse activation phenomenon.
We propose to transform a large model into its mixture-of-experts (MoE) version with equal model size, namely MoEfication.
arXiv Detail & Related papers (2021-10-05T02:14:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.