Load Forecasting on A Highly Sparse Electrical Load Dataset Using Gaussian Interpolation
- URL: http://arxiv.org/abs/2508.14069v1
- Date: Tue, 12 Aug 2025 03:15:45 GMT
- Title: Load Forecasting on A Highly Sparse Electrical Load Dataset Using Gaussian Interpolation
- Authors: Chinmoy Biswas, Nafis Faisal, Vivek Chowdhury, Abrar Al-Shadid Abir, Sabir Mahmud, Mithon Rahman, Shaikh Anowarul Fattah, Hafiz Imtiaz,
- Abstract summary: Sparsity, defined as the presence of missing or zero values in a dataset, often poses a major challenge while operating on real-life datasets.<n>In this study, we show that an approximately 62% dataset with hourly load data of a power plant can be utilized for load forecasting assuming the data is Wide Sense Stationary (WSS)<n>More specifically, we perform statistical analysis on the data, and train multiple machine learning and deep learning models on the dataset.
- Score: 0.786975267379228
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sparsity, defined as the presence of missing or zero values in a dataset, often poses a major challenge while operating on real-life datasets. Sparsity in features or target data of the training dataset can be handled using various interpolation methods, such as linear or polynomial interpolation, spline, moving average, or can be simply imputed. Interpolation methods usually perform well with Strict Sense Stationary (SSS) data. In this study, we show that an approximately 62\% sparse dataset with hourly load data of a power plant can be utilized for load forecasting assuming the data is Wide Sense Stationary (WSS), if augmented with Gaussian interpolation. More specifically, we perform statistical analysis on the data, and train multiple machine learning and deep learning models on the dataset. By comparing the performance of these models, we empirically demonstrate that Gaussian interpolation is a suitable option for dealing with load forecasting problems. Additionally, we demonstrate that Long Short-term Memory (LSTM)-based neural network model offers the best performance among a diverse set of classical and neural network-based models.
Related papers
- Towards Data-Efficient Pretraining for Atomic Property Prediction [51.660835328611626]
We show that pretraining on a task-relevant dataset can match or surpass large-scale pretraining.<n>We introduce the Chemical Similarity Index (CSI), a novel metric inspired by computer vision's Fr'echet Inception Distance.
arXiv Detail & Related papers (2025-02-16T11:46:23Z) - Tackling Data Heterogeneity in Federated Time Series Forecasting [61.021413959988216]
Time series forecasting plays a critical role in various real-world applications, including energy consumption prediction, disease transmission monitoring, and weather forecasting.
Most existing methods rely on a centralized training paradigm, where large amounts of data are collected from distributed devices to a central cloud server.
We propose a novel framework, Fed-TREND, to address data heterogeneity by generating informative synthetic data as auxiliary knowledge carriers.
arXiv Detail & Related papers (2024-11-24T04:56:45Z) - A Language Model-Guided Framework for Mining Time Series with Distributional Shifts [5.082311792764403]
This paper presents an approach that utilizes large language models and data source interfaces to explore and collect time series datasets.
While obtained from external sources, the collected data share critical statistical properties with primary time series datasets.
It suggests that collected datasets can effectively supplement existing datasets, especially involving changes in data distribution.
arXiv Detail & Related papers (2024-06-07T20:21:07Z) - Diffusion posterior sampling for simulation-based inference in tall data settings [53.17563688225137]
Simulation-based inference ( SBI) is capable of approximating the posterior distribution that relates input parameters to a given observation.
In this work, we consider a tall data extension in which multiple observations are available to better infer the parameters of the model.
We compare our method to recently proposed competing approaches on various numerical experiments and demonstrate its superiority in terms of numerical stability and computational cost.
arXiv Detail & Related papers (2024-04-11T09:23:36Z) - Minimally Supervised Learning using Topological Projections in
Self-Organizing Maps [55.31182147885694]
We introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs)
Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU)
Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques.
arXiv Detail & Related papers (2024-01-12T22:51:48Z) - Improving age prediction: Utilizing LSTM-based dynamic forecasting for
data augmentation in multivariate time series analysis [16.91773394335563]
We propose a data augmentation and validation framework that utilizes dynamic forecasting with Long Short-Term Memory (LSTM) networks to enrich datasets.
The effectiveness of these augmented datasets was then compared with the original data using various deep learning models designed for chronological age prediction tasks.
arXiv Detail & Related papers (2023-12-11T22:47:26Z) - Pushing the Limits of Pre-training for Time Series Forecasting in the
CloudOps Domain [54.67888148566323]
We introduce three large-scale time series forecasting datasets from the cloud operations domain.
We show it is a strong zero-shot baseline and benefits from further scaling, both in model and dataset size.
Accompanying these datasets and results is a suite of comprehensive benchmark results comparing classical and deep learning baselines to our pre-trained method.
arXiv Detail & Related papers (2023-10-08T08:09:51Z) - A Dataset Fusion Algorithm for Generalised Anomaly Detection in
Homogeneous Periodic Time Series Datasets [0.0]
"Dataset Fusion" is an algorithm for fusing periodic signals from multiple homogeneous datasets into a single dataset.
The proposed approach significantly outperforms conventional training approaches with an Average F1 score of 0.879.
Results show that using only 6.25% of the training data, translating to a 93.7% reduction in computational power, results in a mere 4.04% decrease in performance.
arXiv Detail & Related papers (2023-05-14T16:24:09Z) - CHALLENGER: Training with Attribution Maps [63.736435657236505]
We show that utilizing attribution maps for training neural networks can improve regularization of models and thus increase performance.
In particular, we show that our generic domain-independent approach yields state-of-the-art results in vision, natural language processing and on time series tasks.
arXiv Detail & Related papers (2022-05-30T13:34:46Z) - A data filling methodology for time series based on CNN and (Bi)LSTM
neural networks [0.0]
We develop two Deep Learning models aimed at filling data gaps in time series obtained from monitored apartments in Bolzano, Italy.
Our approach manages to capture the fluctuating nature of the data and shows good accuracy in reconstructing the target time series.
arXiv Detail & Related papers (2022-04-21T09:40:30Z) - Towards Synthetic Multivariate Time Series Generation for Flare
Forecasting [5.098461305284216]
One of the limiting factors in training data-driven, rare-event prediction algorithms is the scarcity of the events of interest.
In this study, we explore the usefulness of the conditional generative adversarial network (CGAN) as a means to perform data-informed oversampling.
arXiv Detail & Related papers (2021-05-16T22:23:23Z) - Deep Cellular Recurrent Network for Efficient Analysis of Time-Series
Data with Spatial Information [52.635997570873194]
This work proposes a novel deep cellular recurrent neural network (DCRNN) architecture to process complex multi-dimensional time series data with spatial information.
The proposed architecture achieves state-of-the-art performance while utilizing substantially less trainable parameters when compared to comparable methods in the literature.
arXiv Detail & Related papers (2021-01-12T20:08:18Z) - Learning summary features of time series for likelihood free inference [93.08098361687722]
We present a data-driven strategy for automatically learning summary features from time series data.
Our results indicate that learning summary features from data can compete and even outperform LFI methods based on hand-crafted values.
arXiv Detail & Related papers (2020-12-04T19:21:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.