On Optimizing Hyperparameters for Quantum Neural Networks
- URL: http://arxiv.org/abs/2403.18579v1
- Date: Wed, 27 Mar 2024 13:59:09 GMT
- Title: On Optimizing Hyperparameters for Quantum Neural Networks
- Authors: Sabrina Herbst, Vincenzo De Maio, Ivona Brandic,
- Abstract summary: Current state-of-the-art Machine Learning models require weeks for training, which is associated with an enormous $CO$ footprint.
Quantum Computing, and specifically Quantum Machine Learning (QML), can offer significant theoretical speed-ups and enhanced power.
- Score: 0.5999777817331317
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The increasing capabilities of Machine Learning (ML) models go hand in hand with an immense amount of data and computational power required for training. Therefore, training is usually outsourced into HPC facilities, where we have started to experience limits in scaling conventional HPC hardware, as theorized by Moore's law. Despite heavy parallelization and optimization efforts, current state-of-the-art ML models require weeks for training, which is associated with an enormous $CO_2$ footprint. Quantum Computing, and specifically Quantum Machine Learning (QML), can offer significant theoretical speed-ups and enhanced expressive power. However, training QML models requires tuning various hyperparameters, which is a nontrivial task and suboptimal choices can highly affect the trainability and performance of the models. In this study, we identify the most impactful hyperparameters and collect data about the performance of QML models. We compare different configurations and provide researchers with performance data and concrete suggestions for hyperparameter selection.
Related papers
- Optimization Hyper-parameter Laws for Large Language Models [56.322914260197734]
We present Opt-Laws, a framework that captures the relationship between hyper- parameters and training outcomes.
Our validation across diverse model sizes and data scales demonstrates Opt-Laws' ability to accurately predict training loss.
This approach significantly reduces computational costs while enhancing overall model performance.
arXiv Detail & Related papers (2024-09-07T09:37:19Z) - EfficientQAT: Efficient Quantization-Aware Training for Large Language Models [50.525259103219256]
quantization-aware training (QAT) offers a solution by reducing memory consumption through low-bit representations with minimal accuracy loss.
We propose Efficient Quantization-Aware Training (EfficientQAT), a more feasible QAT algorithm.
EfficientQAT involves two consecutive phases: Block-wise training of all parameters (Block-AP) and end-to-end training of quantization parameters (E2E-QP)
arXiv Detail & Related papers (2024-07-10T17:53:30Z) - Model Performance Prediction for Hyperparameter Optimization of Deep
Learning Models Using High Performance Computing and Quantum Annealing [0.0]
We show that integrating model performance prediction with early stopping methods holds great potential to speed up the HPO process of deep learning models.
We propose a novel algorithm called Swift-Hyperband that can use either classical or quantum support vector regression for performance prediction.
arXiv Detail & Related papers (2023-11-29T10:32:40Z) - QKSAN: A Quantum Kernel Self-Attention Network [53.96779043113156]
A Quantum Kernel Self-Attention Mechanism (QKSAM) is introduced to combine the data representation merit of Quantum Kernel Methods (QKM) with the efficient information extraction capability of SAM.
A Quantum Kernel Self-Attention Network (QKSAN) framework is proposed based on QKSAM, which ingeniously incorporates the Deferred Measurement Principle (DMP) and conditional measurement techniques.
Four QKSAN sub-models are deployed on PennyLane and IBM Qiskit platforms to perform binary classification on MNIST and Fashion MNIST.
arXiv Detail & Related papers (2023-08-25T15:08:19Z) - Classical-to-Quantum Transfer Learning Facilitates Machine Learning with Variational Quantum Circuit [62.55763504085508]
We prove that a classical-to-quantum transfer learning architecture using a Variational Quantum Circuit (VQC) improves the representation and generalization (estimation error) capabilities of the VQC model.
We show that the architecture of classical-to-quantum transfer learning leverages pre-trained classical generative AI models, making it easier to find the optimal parameters for the VQC in the training stage.
arXiv Detail & Related papers (2023-05-18T03:08:18Z) - Reflection Equivariant Quantum Neural Networks for Enhanced Image
Classification [0.7232471205719458]
We build new machine learning models which explicitly respect the symmetries inherent in their data, so-called geometric quantum machine learning (GQML)
We find that these networks are capable of consistently and significantly outperforming generic ansatze on complicated real-world image datasets.
arXiv Detail & Related papers (2022-12-01T04:10:26Z) - Subtleties in the trainability of quantum machine learning models [0.0]
We show that gradient scaling results for Variational Quantum Algorithms can be applied to study the gradient scaling of Quantum Machine Learning models.
Our results indicate that features deemed detrimental for VQA trainability can also lead to issues such as barren plateaus in QML.
arXiv Detail & Related papers (2021-10-27T20:28:53Z) - MoEfication: Conditional Computation of Transformer Models for Efficient
Inference [66.56994436947441]
Transformer-based pre-trained language models can achieve superior performance on most NLP tasks due to large parameter capacity, but also lead to huge computation cost.
We explore to accelerate large-model inference by conditional computation based on the sparse activation phenomenon.
We propose to transform a large model into its mixture-of-experts (MoE) version with equal model size, namely MoEfication.
arXiv Detail & Related papers (2021-10-05T02:14:38Z) - Towards Efficient Post-training Quantization of Pre-trained Language
Models [85.68317334241287]
We study post-training quantization(PTQ) of PLMs, and propose module-wise quantization error minimization(MREM), an efficient solution to mitigate these issues.
Experiments on GLUE and SQuAD benchmarks show that our proposed PTQ solution not only performs close to QAT, but also enjoys significant reductions in training time, memory overhead, and data consumption.
arXiv Detail & Related papers (2021-09-30T12:50:06Z) - Distributed Training and Optimization Of Neural Networks [0.0]
Deep learning models are yielding increasingly better performances thanks to multiple factors.
To be successful, model may have large number of parameters or complex architectures and be trained on large dataset.
This leads to large requirements on computing resource and turn around time.
arXiv Detail & Related papers (2020-12-03T11:18:46Z) - Multi-level Training and Bayesian Optimization for Economical
Hyperparameter Optimization [12.92634461859467]
In this paper, we develop an effective approach to reducing the total amount of required training time for Hyperparameter Optimization.
We propose a truncated additive Gaussian process model to calibrate approximate performance measurements generated by light training.
Based on the model, a sequential model-based algorithm is developed to generate the performance profile of the configuration space as well as find optimal ones.
arXiv Detail & Related papers (2020-07-20T09:03:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.