Related papers: Differential Evolution Algorithm based Hyper-Parameters Selection of Transformer Neural Network Model for Load Forecasting

Differential Evolution Algorithm based Hyper-Parameters Selection of Transformer Neural Network Model for Load Forecasting

URL: http://arxiv.org/abs/2307.15299v5
Date: Mon, 5 Feb 2024 03:20:06 GMT
Title: Differential Evolution Algorithm based Hyper-Parameters Selection of Transformer Neural Network Model for Load Forecasting
Authors: Anuvab Sen, Arul Rhik Mazumder, Udayon Sen
Abstract summary: Transformer models have the potential to improve Load forecasting because of their ability to learn long-range dependencies derived from their Attention Mechanism. Our work compares the proposed Transformer based Neural Network model integrated with different metaheuristic algorithms by their performance in Load forecasting based on numerical metrics such as Mean Squared Error (MSE) and Mean Absolute Percentage Error (MAPE)
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Accurate load forecasting plays a vital role in numerous sectors, but accurately capturing the complex dynamics of dynamic power systems remains a challenge for traditional statistical models. For these reasons, time-series models (ARIMA) and deep-learning models (ANN, LSTM, GRU, etc.) are commonly deployed and often experience higher success. In this paper, we analyze the efficacy of the recently developed Transformer-based Neural Network model in Load forecasting. Transformer models have the potential to improve Load forecasting because of their ability to learn long-range dependencies derived from their Attention Mechanism. We apply several metaheuristics namely Differential Evolution to find the optimal hyperparameters of the Transformer-based Neural Network to produce accurate forecasts. Differential Evolution provides scalable, robust, global solutions to non-differentiable, multi-objective, or constrained optimization problems. Our work compares the proposed Transformer based Neural Network model integrated with different metaheuristic algorithms by their performance in Load forecasting based on numerical metrics such as Mean Squared Error (MSE) and Mean Absolute Percentage Error (MAPE). Our findings demonstrate the potential of metaheuristic-enhanced Transformer-based Neural Network models in Load forecasting accuracy and provide optimal hyperparameters for each model.

Related papers

Instruction-Guided Autoregressive Neural Network Parameter Generation [49.800239140036496]
We propose IGPG, an autoregressive framework that unifies parameter synthesis across diverse tasks and architectures. By autoregressively generating neural network weights' tokens, IGPG ensures inter-layer coherence and enables efficient adaptation across models and datasets. Experiments on multiple datasets demonstrate that IGPG consolidates diverse pretrained models into a single, flexible generative framework.
arXiv Detail & Related papers (2025-04-02T05:50:19Z)
Neural ODE Transformers: Analyzing Internal Dynamics and Adaptive Fine-tuning [30.781578037476347]
We introduce a novel approach to modeling transformer architectures using highly flexible non-autonomous neural ordinary differential equations (ODEs) Our proposed model parameterizes all weights of attention and feed-forward blocks through neural networks, expressing these weights as functions of a continuous layer index. Our neural ODE transformer demonstrates performance comparable to or better than vanilla transformers across various configurations and datasets.
arXiv Detail & Related papers (2025-03-03T09:12:14Z)
Leveraging Convolutional Neural Network-Transformer Synergy for Predictive Modeling in Risk-Based Applications [5.914777314371152]
This paper proposes a deep learning model based on the combination of convolutional neural networks (CNN) and Transformer for credit user default prediction. The results show that the CNN+Transformer model outperforms traditional machine learning models, such as random forests and XGBoost. This study provides a new idea for credit default prediction and provides strong support for risk assessment and intelligent decision-making in the financial field.
arXiv Detail & Related papers (2024-12-24T07:07:14Z)
Neural Residual Diffusion Models for Deep Scalable Vision Generation [17.931568104324985]
We propose a unified and massively scalable Neural Residual Diffusion Models framework (Neural-RDM) The proposed neural residual models obtain state-of-the-art scores on image's and video's generative benchmarks.
arXiv Detail & Related papers (2024-06-19T04:57:18Z)
Dynamical Mean-Field Theory of Self-Attention Neural Networks [0.0]
Transformer-based models have demonstrated exceptional performance across diverse domains. Little is known about how they operate or what are their expected dynamics. We use methods for the study of asymmetric Hopfield networks in nonequilibrium regimes.
arXiv Detail & Related papers (2024-06-11T13:29:34Z)
Probabilistic MIMO U-Net: Efficient and Accurate Uncertainty Estimation for Pixel-wise Regression [1.4528189330418977]
Uncertainty estimation in machine learning is paramount for enhancing the reliability and interpretability of predictive models. We present an adaptation of the Multiple-Input Multiple-Output (MIMO) framework for pixel-wise regression tasks.
arXiv Detail & Related papers (2023-08-14T22:08:28Z)
Towards Long-Term predictions of Turbulence using Neural Operators [68.8204255655161]
It aims to develop reduced-order/surrogate models for turbulent flow simulations using Machine Learning. Different model structures are analyzed, with U-NET structures performing better than the standard FNO in accuracy and stability.
arXiv Detail & Related papers (2023-07-25T14:09:53Z)
Learning to Learn with Generative Models of Neural Network Checkpoints [71.06722933442956]
We construct a dataset of neural network checkpoints and train a generative model on the parameters. We find that our approach successfully generates parameters for a wide range of loss prompts. We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.
arXiv Detail & Related papers (2022-09-26T17:59:58Z)
On the Generalization and Adaption Performance of Causal Models [99.64022680811281]
Differentiable causal discovery has proposed to factorize the data generating process into a set of modules. We study the generalization and adaption performance of such modular neural causal models. Our analysis shows that the modular neural causal models outperform other models on both zero and few-shot adaptation in low data regimes.
arXiv Detail & Related papers (2022-06-09T17:12:32Z)
Equivariant vector field network for many-body system modeling [65.22203086172019]
Equivariant Vector Field Network (EVFN) is built on a novel equivariant basis and the associated scalarization and vectorization layers. We evaluate our method on predicting trajectories of simulated Newton mechanics systems with both full and partially observed data.
arXiv Detail & Related papers (2021-10-26T14:26:25Z)
Development of Deep Transformer-Based Models for Long-Term Prediction of Transient Production of Oil Wells [9.832272256738452]
We propose a novel approach to data-driven modeling of a transient production of oil wells. We apply the transformer-based neural networks trained on the multivariate time series composed of various parameters of oil wells. We generalize the single-well model based on the transformer architecture for multiple wells to simulate complex transient oilfield-level patterns.
arXiv Detail & Related papers (2021-10-12T15:00:45Z)
MoEfication: Conditional Computation of Transformer Models for Efficient Inference [66.56994436947441]
Transformer-based pre-trained language models can achieve superior performance on most NLP tasks due to large parameter capacity, but also lead to huge computation cost. We explore to accelerate large-model inference by conditional computation based on the sparse activation phenomenon. We propose to transform a large model into its mixture-of-experts (MoE) version with equal model size, namely MoEfication.
arXiv Detail & Related papers (2021-10-05T02:14:38Z)
Disentangled Generative Models for Robust Prediction of System Dynamics [2.6424064030995957]
In this work, we treat the domain parameters of dynamical systems as factors of variation of the data generating process. By leveraging ideas from supervised disentanglement and causal factorization, we aim to separate the domain parameters from the dynamics in the latent space of generative models. Results indicate that disentangled VAEs adapt better to domain parameters spaces that were not present in the training data.
arXiv Detail & Related papers (2021-08-26T09:58:06Z)
Trajectory-wise Multiple Choice Learning for Dynamics Generalization in Reinforcement Learning [137.39196753245105]
We present a new model-based reinforcement learning algorithm that learns a multi-headed dynamics model for dynamics generalization. We incorporate context learning, which encodes dynamics-specific information from past experiences into the context latent vector. Our method exhibits superior zero-shot generalization performance across a variety of control tasks, compared to state-of-the-art RL methods.
arXiv Detail & Related papers (2020-10-26T03:20:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.