Understanding the effect of hyperparameter optimization on machine
learning models for structure design problems
- URL: http://arxiv.org/abs/2007.04431v2
- Date: Mon, 15 Mar 2021 22:14:58 GMT
- Title: Understanding the effect of hyperparameter optimization on machine
learning models for structure design problems
- Authors: Xianping Du, Hongyi Xu, Feng Zhu
- Abstract summary: Machine learning algorithms (MLAs) have been implemented as surrogate models in computer-aided engineering design.
There is a lack of systematic studies on the effect of hyperparameters on the accuracy and robustness of the surrogate model.
Four frequently used MLAs, namely Gaussian Process Regression (GPR), Support Vector Machine (SVM), Random Forest Regression (RFR) and Artificial Neural Network (ANN) are tested.
The results show that HOpt can generally improve the performance of the MLA models in general.
- Score: 8.504300709184177
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To relieve the computational cost of design evaluations using expensive
finite element simulations, surrogate models have been widely applied in
computer-aided engineering design. Machine learning algorithms (MLAs) have been
implemented as surrogate models due to their capability of learning the complex
interrelations between the design variables and the response from big datasets.
Typically, an MLA regression model contains model parameters and
hyperparameters. The model parameters are obtained by fitting the training
data. Hyperparameters, which govern the model structures and the training
processes, are assigned by users before training. There is a lack of systematic
studies on the effect of hyperparameters on the accuracy and robustness of the
surrogate model. In this work, we proposed to establish a hyperparameter
optimization (HOpt) framework to deepen our understanding of the effect. Four
frequently used MLAs, namely Gaussian Process Regression (GPR), Support Vector
Machine (SVM), Random Forest Regression (RFR), and Artificial Neural Network
(ANN), are tested on four benchmark examples. For each MLA model, the model
accuracy and robustness before and after the HOpt are compared. The results
show that HOpt can generally improve the performance of the MLA models in
general. HOpt leads to few improvements in the MLAs accuracy and robustness for
complex problems, which are featured by high-dimensional mixed-variable design
space. The HOpt is recommended for the design problems with intermediate
complexity. We also investigated the additional computational costs incurred by
HOpt. The training cost is closely related to the MLA architecture. After HOpt,
the training cost of ANN and RFR is increased more than that of the GPR and
SVM. To sum up, this study benefits the selection of HOpt method for the
different types of design problems based on their complexity.
Related papers
- SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction.
SMILE allows for the upscaling of source models into an MoE model without extra data or further training.
We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z) - Simulated Overparameterization [35.12611686956487]
We introduce a novel paradigm called Simulated Overparametrization ( SOP)
SOP proposes a unique approach to model training and inference, where a model with a significantly larger number of parameters is trained in such a way as a smaller, efficient subset of these parameters is used for the actual computation during inference.
We present a novel, architecture agnostic algorithm called "majority kernels", which seamlessly integrates with predominant architectures, including Transformer models.
arXiv Detail & Related papers (2024-02-07T17:07:41Z) - Fairer and More Accurate Tabular Models Through NAS [14.147928131445852]
We propose using multi-objective Neural Architecture Search (NAS) and Hyperparameter Optimization (HPO) in the first application to the very challenging domain of tabular data.
We show that models optimized solely for accuracy with NAS often fail to inherently address fairness concerns.
We produce architectures that consistently dominate state-of-the-art bias mitigation methods either in fairness, accuracy or both.
arXiv Detail & Related papers (2023-10-18T17:56:24Z) - Scaling Pre-trained Language Models to Deeper via Parameter-efficient
Architecture [68.13678918660872]
We design a more capable parameter-sharing architecture based on matrix product operator (MPO)
MPO decomposition can reorganize and factorize the information of a parameter matrix into two parts.
Our architecture shares the central tensor across all layers for reducing the model size.
arXiv Detail & Related papers (2023-03-27T02:34:09Z) - On the Influence of Enforcing Model Identifiability on Learning dynamics
of Gaussian Mixture Models [14.759688428864159]
We propose a technique for extracting submodels from singular models.
Our method enforces model identifiability during training.
We show how the method can be applied to more complex models like deep neural networks.
arXiv Detail & Related papers (2022-06-17T07:50:22Z) - Re-parameterizing Your Optimizers rather than Architectures [119.08740698936633]
We propose a novel paradigm of incorporating model-specific prior knowledge into Structurals and using them to train generic (simple) models.
As an implementation, we propose a novel methodology to add prior knowledge by modifying the gradients according to a set of model-specific hyper- parameters.
For a simple model trained with a Repr, we focus on a VGG-style plain model and showcase that such a simple model trained with a Repr, which is referred to as Rep-VGG, performs on par with the recent well-designed models.
arXiv Detail & Related papers (2022-05-30T16:55:59Z) - Physics-informed linear regression is a competitive approach compared to
Machine Learning methods in building MPC [0.8135412538980287]
We show that control in general leads to satisfactory reductions in heating and cooling energy compared to the building's baseline controller.
We also see that the physics-informed ARMAX models have a lower computational burden, and a superior sample efficiency compared to the Machine Learning based models.
arXiv Detail & Related papers (2021-10-29T16:56:05Z) - MoEfication: Conditional Computation of Transformer Models for Efficient
Inference [66.56994436947441]
Transformer-based pre-trained language models can achieve superior performance on most NLP tasks due to large parameter capacity, but also lead to huge computation cost.
We explore to accelerate large-model inference by conditional computation based on the sparse activation phenomenon.
We propose to transform a large model into its mixture-of-experts (MoE) version with equal model size, namely MoEfication.
arXiv Detail & Related papers (2021-10-05T02:14:38Z) - Optimization-driven Machine Learning for Intelligent Reflecting Surfaces
Assisted Wireless Networks [82.33619654835348]
Intelligent surface (IRS) has been employed to reshape the wireless channels by controlling individual scattering elements' phase shifts.
Due to the large size of scattering elements, the passive beamforming is typically challenged by the high computational complexity.
In this article, we focus on machine learning (ML) approaches for performance in IRS-assisted wireless networks.
arXiv Detail & Related papers (2020-08-29T08:39:43Z) - Automatically Learning Compact Quality-aware Surrogates for Optimization
Problems [55.94450542785096]
Solving optimization problems with unknown parameters requires learning a predictive model to predict the values of the unknown parameters and then solving the problem using these values.
Recent work has shown that including the optimization problem as a layer in a complex training model pipeline results in predictions of iteration of unobserved decision making.
We show that we can improve solution quality by learning a low-dimensional surrogate model of a large optimization problem.
arXiv Detail & Related papers (2020-06-18T19:11:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.