Imputing Missing Observations with Time Sliced Synthetic Minority
Oversampling Technique
- URL: http://arxiv.org/abs/2201.05634v1
- Date: Fri, 14 Jan 2022 19:23:24 GMT
- Title: Imputing Missing Observations with Time Sliced Synthetic Minority
Oversampling Technique
- Authors: Andrew Baumgartner, Sevda Molani, Qi Wei and Jennifer Hadlock
- Abstract summary: We present a simple yet novel time series imputation technique with the goal of constructing an irregular time series that is uniform across every sample in a data set.
We fix a grid defined by the midpoints of non-overlapping bins (dubbed "slices") of observation times and ensure that each sample has values for all of the features at that given time.
This allows one to both impute fully missing observations to allow uniform time series classification across the entire data and, in special cases, to impute individually missing features.
- Score: 0.3973560285628012
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We present a simple yet novel time series imputation technique with the goal
of constructing an irregular time series that is uniform across every sample in
a data set. Specifically, we fix a grid defined by the midpoints of
non-overlapping bins (dubbed "slices") of observation times and ensure that
each sample has values for all of the features at that given time. This allows
one to both impute fully missing observations to allow uniform time series
classification across the entire data and, in special cases, to impute
individually missing features. To do so, we slightly generalize the well-known
class imbalance algorithm SMOTE \cite{smote} to allow component wise nearest
neighbor interpolation that preserves correlations when there are no missing
features. We visualize the method in the simplified setting of 2-dimensional
uncoupled harmonic oscillators. Next, we use tSMOTE to train an Encoder/Decoder
long-short term memory (LSTM) model with Logistic Regression for predicting and
classifying distinct trajectories of different 2D oscillators. After
illustrating the the utility of tSMOTE in this context, we use the same
architecture to train a clinical model for COVID-19 disease severity on an
imputed data set. Our experiments show an improvement over standard mean and
median imputation techniques by allowing a wider class of patient trajectories
to be recognized by the model, as well as improvement over aggregated
classification models.
Related papers
- Graph Spatiotemporal Process for Multivariate Time Series Anomaly
Detection with Missing Values [67.76168547245237]
We introduce a novel framework called GST-Pro, which utilizes a graphtemporal process and anomaly scorer to detect anomalies.
Our experimental results show that the GST-Pro method can effectively detect anomalies in time series data and outperforms state-of-the-art methods.
arXiv Detail & Related papers (2024-01-11T10:10:16Z) - Domain Adaptive Synapse Detection with Weak Point Annotations [63.97144211520869]
We present AdaSyn, a framework for domain adaptive synapse detection with weak point annotations.
In the WASPSYN challenge at I SBI 2023, our method ranks the 1st place.
arXiv Detail & Related papers (2023-08-31T05:05:53Z) - TSI-GAN: Unsupervised Time Series Anomaly Detection using Convolutional
Cycle-Consistent Generative Adversarial Networks [2.4469484645516837]
Anomaly detection is widely used in network intrusion detection, autonomous driving, medical diagnosis, credit card frauds, etc.
This paper proposes TSI-GAN, an unsupervised anomaly detection model for time-series that can learn complex temporal patterns automatically.
We evaluate TSI-GAN using 250 well-curated and harder-than-usual datasets and compare with 8 state-of-the-art baseline methods.
arXiv Detail & Related papers (2023-03-22T23:24:47Z) - Mutual Exclusivity Training and Primitive Augmentation to Induce
Compositionality [84.94877848357896]
Recent datasets expose the lack of the systematic generalization ability in standard sequence-to-sequence models.
We analyze this behavior of seq2seq models and identify two contributing factors: a lack of mutual exclusivity bias and the tendency to memorize whole examples.
We show substantial empirical improvements using standard sequence-to-sequence models on two widely-used compositionality datasets.
arXiv Detail & Related papers (2022-11-28T17:36:41Z) - Tripletformer for Probabilistic Interpolation of Irregularly sampled
Time Series [6.579888565581481]
We present a novel encoder-decoder architecture called "Tripletformer" for probabilistic of irregularly sampled time series with missing values.
This attention-based model operates on sets of observations, where each element is composed of a triple time, channel, and value.
Results indicate an improvement in negative loglikelihood error by up to 32% on real-world datasets and 85% on synthetic datasets.
arXiv Detail & Related papers (2022-10-05T08:31:05Z) - Stacked Residuals of Dynamic Layers for Time Series Anomaly Detection [0.0]
We present an end-to-end differentiable neural network architecture to perform anomaly detection in multivariate time series.
The architecture is a cascade of dynamical systems designed to separate linearly predictable components of the signal.
The anomaly detector exploits the temporal structure of the prediction residuals to detect both isolated point anomalies and set-point changes.
arXiv Detail & Related papers (2022-02-25T01:50:22Z) - Anomaly Detection of Time Series with Smoothness-Inducing Sequential
Variational Auto-Encoder [59.69303945834122]
We present a Smoothness-Inducing Sequential Variational Auto-Encoder (SISVAE) model for robust estimation and anomaly detection of time series.
Our model parameterizes mean and variance for each time-stamp with flexible neural networks.
We show the effectiveness of our model on both synthetic datasets and public real-world benchmarks.
arXiv Detail & Related papers (2021-02-02T06:15:15Z) - Robust Finite Mixture Regression for Heterogeneous Targets [70.19798470463378]
We propose an FMR model that finds sample clusters and jointly models multiple incomplete mixed-type targets simultaneously.
We provide non-asymptotic oracle performance bounds for our model under a high-dimensional learning framework.
The results show that our model can achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-10-12T03:27:07Z) - Learning from Irregularly-Sampled Time Series: A Missing Data
Perspective [18.493394650508044]
Irregularly-sampled time series occur in many domains including healthcare.
We model irregularly-sampled time series data as a sequence of index-value pairs sampled from a continuous but unobserved function.
We propose learning methods for this framework based on variational autoencoders and generative adversarial networks.
arXiv Detail & Related papers (2020-08-17T20:01:55Z) - Unsupervised Online Anomaly Detection On Irregularly Sampled Or Missing
Valued Time-Series Data Using LSTM Networks [0.0]
We study anomaly detection and introduce an algorithm that processes variable length, irregularly sampled sequences or sequences with missing values.
Our algorithm is fully unsupervised, however, can be readily extended to supervised or semisupervised cases.
arXiv Detail & Related papers (2020-05-25T09:41:04Z) - Convolutional Tensor-Train LSTM for Spatio-temporal Learning [116.24172387469994]
We propose a higher-order LSTM model that can efficiently learn long-term correlations in the video sequence.
This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time.
Our results achieve state-of-the-art performance-art in a wide range of applications and datasets.
arXiv Detail & Related papers (2020-02-21T05:00:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.