Differential Evolution Algorithm based Hyper-Parameters Selection of
Transformer Neural Network Model for Load Forecasting
- URL: http://arxiv.org/abs/2307.15299v5
- Date: Mon, 5 Feb 2024 03:20:06 GMT
- Title: Differential Evolution Algorithm based Hyper-Parameters Selection of
Transformer Neural Network Model for Load Forecasting
- Authors: Anuvab Sen, Arul Rhik Mazumder, Udayon Sen
- Abstract summary: Transformer models have the potential to improve Load forecasting because of their ability to learn long-range dependencies derived from their Attention Mechanism.
Our work compares the proposed Transformer based Neural Network model integrated with different metaheuristic algorithms by their performance in Load forecasting based on numerical metrics such as Mean Squared Error (MSE) and Mean Absolute Percentage Error (MAPE)
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Accurate load forecasting plays a vital role in numerous sectors, but
accurately capturing the complex dynamics of dynamic power systems remains a
challenge for traditional statistical models. For these reasons, time-series
models (ARIMA) and deep-learning models (ANN, LSTM, GRU, etc.) are commonly
deployed and often experience higher success. In this paper, we analyze the
efficacy of the recently developed Transformer-based Neural Network model in
Load forecasting. Transformer models have the potential to improve Load
forecasting because of their ability to learn long-range dependencies derived
from their Attention Mechanism. We apply several metaheuristics namely
Differential Evolution to find the optimal hyperparameters of the
Transformer-based Neural Network to produce accurate forecasts. Differential
Evolution provides scalable, robust, global solutions to non-differentiable,
multi-objective, or constrained optimization problems. Our work compares the
proposed Transformer based Neural Network model integrated with different
metaheuristic algorithms by their performance in Load forecasting based on
numerical metrics such as Mean Squared Error (MSE) and Mean Absolute Percentage
Error (MAPE). Our findings demonstrate the potential of metaheuristic-enhanced
Transformer-based Neural Network models in Load forecasting accuracy and
provide optimal hyperparameters for each model.
Related papers
- Neural Residual Diffusion Models for Deep Scalable Vision Generation [17.931568104324985]
We propose a unified and massively scalable Neural Residual Diffusion Models framework (Neural-RDM)
The proposed neural residual models obtain state-of-the-art scores on image's and video's generative benchmarks.
arXiv Detail & Related papers (2024-06-19T04:57:18Z) - Dynamical Mean-Field Theory of Self-Attention Neural Networks [0.0]
Transformer-based models have demonstrated exceptional performance across diverse domains.
Little is known about how they operate or what are their expected dynamics.
We use methods for the study of asymmetric Hopfield networks in nonequilibrium regimes.
arXiv Detail & Related papers (2024-06-11T13:29:34Z) - Probabilistic MIMO U-Net: Efficient and Accurate Uncertainty Estimation
for Pixel-wise Regression [1.4528189330418977]
Uncertainty estimation in machine learning is paramount for enhancing the reliability and interpretability of predictive models.
We present an adaptation of the Multiple-Input Multiple-Output (MIMO) framework for pixel-wise regression tasks.
arXiv Detail & Related papers (2023-08-14T22:08:28Z) - Towards Long-Term predictions of Turbulence using Neural Operators [68.8204255655161]
It aims to develop reduced-order/surrogate models for turbulent flow simulations using Machine Learning.
Different model structures are analyzed, with U-NET structures performing better than the standard FNO in accuracy and stability.
arXiv Detail & Related papers (2023-07-25T14:09:53Z) - Learning to Learn with Generative Models of Neural Network Checkpoints [71.06722933442956]
We construct a dataset of neural network checkpoints and train a generative model on the parameters.
We find that our approach successfully generates parameters for a wide range of loss prompts.
We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.
arXiv Detail & Related papers (2022-09-26T17:59:58Z) - On the Generalization and Adaption Performance of Causal Models [99.64022680811281]
Differentiable causal discovery has proposed to factorize the data generating process into a set of modules.
We study the generalization and adaption performance of such modular neural causal models.
Our analysis shows that the modular neural causal models outperform other models on both zero and few-shot adaptation in low data regimes.
arXiv Detail & Related papers (2022-06-09T17:12:32Z) - Equivariant vector field network for many-body system modeling [65.22203086172019]
Equivariant Vector Field Network (EVFN) is built on a novel equivariant basis and the associated scalarization and vectorization layers.
We evaluate our method on predicting trajectories of simulated Newton mechanics systems with both full and partially observed data.
arXiv Detail & Related papers (2021-10-26T14:26:25Z) - Development of Deep Transformer-Based Models for Long-Term Prediction of
Transient Production of Oil Wells [9.832272256738452]
We propose a novel approach to data-driven modeling of a transient production of oil wells.
We apply the transformer-based neural networks trained on the multivariate time series composed of various parameters of oil wells.
We generalize the single-well model based on the transformer architecture for multiple wells to simulate complex transient oilfield-level patterns.
arXiv Detail & Related papers (2021-10-12T15:00:45Z) - MoEfication: Conditional Computation of Transformer Models for Efficient
Inference [66.56994436947441]
Transformer-based pre-trained language models can achieve superior performance on most NLP tasks due to large parameter capacity, but also lead to huge computation cost.
We explore to accelerate large-model inference by conditional computation based on the sparse activation phenomenon.
We propose to transform a large model into its mixture-of-experts (MoE) version with equal model size, namely MoEfication.
arXiv Detail & Related papers (2021-10-05T02:14:38Z) - Disentangled Generative Models for Robust Prediction of System Dynamics [2.6424064030995957]
In this work, we treat the domain parameters of dynamical systems as factors of variation of the data generating process.
By leveraging ideas from supervised disentanglement and causal factorization, we aim to separate the domain parameters from the dynamics in the latent space of generative models.
Results indicate that disentangled VAEs adapt better to domain parameters spaces that were not present in the training data.
arXiv Detail & Related papers (2021-08-26T09:58:06Z) - Trajectory-wise Multiple Choice Learning for Dynamics Generalization in
Reinforcement Learning [137.39196753245105]
We present a new model-based reinforcement learning algorithm that learns a multi-headed dynamics model for dynamics generalization.
We incorporate context learning, which encodes dynamics-specific information from past experiences into the context latent vector.
Our method exhibits superior zero-shot generalization performance across a variety of control tasks, compared to state-of-the-art RL methods.
arXiv Detail & Related papers (2020-10-26T03:20:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.