Development of a Neural Network-based Method for Improved Imputation of
Missing Values in Time Series Data by Repurposing DataWig
- URL: http://arxiv.org/abs/2308.09635v1
- Date: Fri, 18 Aug 2023 15:53:40 GMT
- Title: Development of a Neural Network-based Method for Improved Imputation of
Missing Values in Time Series Data by Repurposing DataWig
- Authors: Daniel Zhang
- Abstract summary: Missing values in time series data occur often and present obstacles to successful analysis, thus they need to be filled with alternative values, a process called imputation.
Although various approaches have been attempted for robust imputation of time series data, even the most advanced methods still face challenges.
I developed tsDataWig (time-series DataWig) by modifying DataWig, a neural network-based method that possesses the capacity to process large datasets.
Unlike the original DataWig, tsDataWig can directly handle values of time variables and impute missing values in complex time
- Score: 1.8719295298860394
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Time series data are observations collected over time intervals. Successful
analysis of time series data captures patterns such as trends, cyclicity and
irregularity, which are crucial for decision making in research, business, and
governance. However, missing values in time series data occur often and present
obstacles to successful analysis, thus they need to be filled with alternative
values, a process called imputation. Although various approaches have been
attempted for robust imputation of time series data, even the most advanced
methods still face challenges including limited scalability, poor capacity to
handle heterogeneous data types and inflexibility due to requiring strong
assumptions of data missing mechanisms. Moreover, the imputation accuracy of
these methods still has room for improvement. In this study, I developed
tsDataWig (time-series DataWig) by modifying DataWig, a neural network-based
method that possesses the capacity to process large datasets and heterogeneous
data types but was designed for non-time series data imputation. Unlike the
original DataWig, tsDataWig can directly handle values of time variables and
impute missing values in complex time series datasets. Using one simulated and
three different complex real-world time series datasets, I demonstrated that
tsDataWig outperforms the original DataWig and the current state-of-the-art
methods for time series data imputation and potentially has broad application
due to not requiring strong assumptions of data missing mechanisms. This study
provides a valuable solution for robustly imputing missing values in
challenging time series datasets, which often contain millions of samples, high
dimensional variables, and heterogeneous data types.
Related papers
- Graph Spatiotemporal Process for Multivariate Time Series Anomaly
Detection with Missing Values [67.76168547245237]
We introduce a novel framework called GST-Pro, which utilizes a graphtemporal process and anomaly scorer to detect anomalies.
Our experimental results show that the GST-Pro method can effectively detect anomalies in time series data and outperforms state-of-the-art methods.
arXiv Detail & Related papers (2024-01-11T10:10:16Z) - Deep Imputation of Missing Values in Time Series Health Data: A Review
with Benchmarking [0.0]
This survey performs six data-centric experiments to benchmark state-of-the-art deep imputation methods on five time series health data sets.
Deep learning methods that jointly perform cross-sectional (across variables) and longitudinal (across time) imputations of missing values in time series data yield statistically better data quality than traditional imputation methods.
arXiv Detail & Related papers (2023-02-10T16:03:36Z) - Time-Varying Propensity Score to Bridge the Gap between the Past and Present [104.46387765330142]
We introduce a time-varying propensity score that can detect gradual shifts in the distribution of data.
We demonstrate different ways of implementing it and evaluate it on a variety of problems.
arXiv Detail & Related papers (2022-10-04T07:21:49Z) - Grouped self-attention mechanism for a memory-efficient Transformer [64.0125322353281]
Real-world tasks such as forecasting weather, electricity consumption, and stock market involve predicting data that vary over time.
Time-series data are generally recorded over a long period of observation with long sequences owing to their periodic characteristics and long-range dependencies over time.
We propose two novel modules, Grouped Self-Attention (GSA) and Compressed Cross-Attention (CCA)
Our proposed model efficiently exhibited reduced computational complexity and performance comparable to or better than existing methods.
arXiv Detail & Related papers (2022-10-02T06:58:49Z) - STING: Self-attention based Time-series Imputation Networks using GAN [4.052758394413726]
STING (Self-attention based Time-series Imputation Networks using GAN) is proposed.
We take advantage of generative adversarial networks and bidirectional recurrent neural networks to learn latent representations of the time series.
Experimental results on three real-world datasets demonstrate that STING outperforms the existing state-of-the-art methods in terms of imputation accuracy.
arXiv Detail & Related papers (2022-09-22T06:06:56Z) - PIETS: Parallelised Irregularity Encoders for Forecasting with
Heterogeneous Time-Series [5.911865723926626]
Heterogeneity and irregularity of multi-source data sets present a significant challenge to time-series analysis.
In this work, we design a novel architecture, PIETS, to model heterogeneous time-series.
We show that PIETS is able to effectively model heterogeneous temporal data and outperforms other state-of-the-art approaches in the prediction task.
arXiv Detail & Related papers (2021-09-30T20:01:19Z) - Deep Time Series Models for Scarce Data [8.673181404172963]
Time series data have grown at an explosive rate in numerous domains and have stimulated a surge of time series modeling research.
Data scarcity is a universal issue that occurs in a vast range of data analytics problems.
arXiv Detail & Related papers (2021-03-16T22:16:54Z) - Deep Cellular Recurrent Network for Efficient Analysis of Time-Series
Data with Spatial Information [52.635997570873194]
This work proposes a novel deep cellular recurrent neural network (DCRNN) architecture to process complex multi-dimensional time series data with spatial information.
The proposed architecture achieves state-of-the-art performance while utilizing substantially less trainable parameters when compared to comparable methods in the literature.
arXiv Detail & Related papers (2021-01-12T20:08:18Z) - Time Series Data Imputation: A Survey on Deep Learning Approaches [4.4458738910060775]
Time series data imputation is a well-studied problem with different categories of methods.
Time series methods based on deep learning have made progress with the usage of models like RNN.
We will review and discuss their model architectures, their pros and cons as well as their effects to show the development of the time series imputation methods.
arXiv Detail & Related papers (2020-11-23T11:57:27Z) - Transformer Hawkes Process [79.16290557505211]
We propose a Transformer Hawkes Process (THP) model, which leverages the self-attention mechanism to capture long-term dependencies.
THP outperforms existing models in terms of both likelihood and event prediction accuracy by a notable margin.
We provide a concrete example, where THP achieves improved prediction performance for learning multiple point processes when incorporating their relational information.
arXiv Detail & Related papers (2020-02-21T13:48:13Z) - DeGAN : Data-Enriching GAN for Retrieving Representative Samples from a
Trained Classifier [58.979104709647295]
We bridge the gap between the abundance of available data and lack of relevant data, for the future learning tasks of a trained network.
We use the available data, that may be an imbalanced subset of the original training dataset, or a related domain dataset, to retrieve representative samples.
We demonstrate that data from a related domain can be leveraged to achieve state-of-the-art performance.
arXiv Detail & Related papers (2019-12-27T02:05:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.