Time Series Synthesis Using the Matrix Profile for Anonymization
- URL: http://arxiv.org/abs/2311.02563v1
- Date: Sun, 5 Nov 2023 04:27:24 GMT
- Title: Time Series Synthesis Using the Matrix Profile for Anonymization
- Authors: Audrey Der, Chin-Chia Michael Yeh, Yan Zheng, Junpeng Wang, Huiyuan
Chen, Zhongfang Zhuang, Liang Wang, Wei Zhang, Eamonn Keogh
- Abstract summary: Many researchers cannot release their data due to privacy regulations or fear of leaking confidential business information.
We propose the Time Series Synthesis Using the Matrix Profile (TSSUMP) method, where synthesized time series can be released in lieu of the original data.
We test our method on a case study of ECG and gender masking prediction.
- Score: 32.22243483781984
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Publishing and sharing data is crucial for the data mining community,
allowing collaboration and driving open innovation. However, many researchers
cannot release their data due to privacy regulations or fear of leaking
confidential business information. To alleviate such issues, we propose the
Time Series Synthesis Using the Matrix Profile (TSSUMP) method, where
synthesized time series can be released in lieu of the original data. The
TSSUMP method synthesizes time series by preserving similarity join information
(i.e., Matrix Profile) while reducing the correlation between the synthesized
and the original time series. As a result, neither the values for the
individual time steps nor the local patterns (or shapes) from the original data
can be recovered, yet the resulting data can be used for downstream tasks that
data analysts are interested in. We concentrate on similarity joins because
they are one of the most widely applied time series data mining routines across
different data mining tasks. We test our method on a case study of ECG and
gender masking prediction. In this case study, the gender information is not
only removed from the synthesized time series, but the synthesized time series
also preserves enough information from the original time series. As a result,
unmodified data mining tools can obtain near-identical performance on the
synthesized time series as on the original time series.
Related papers
- Towards Foundation Time Series Model: To Synthesize Or Not To
Synthesize? [2.8707270250981094]
We consider the essential question if it is advantageous to train a foundation model on synthetic data or it is better to utilize only a limited number of real-life examples.
Our experiments are conducted only for regular time series and speak in favor of leveraging solely the real time series.
arXiv Detail & Related papers (2024-03-04T23:03:17Z) - TimeSiam: A Pre-Training Framework for Siamese Time-Series Modeling [67.02157180089573]
Time series pre-training has recently garnered wide attention for its potential to reduce labeling expenses and benefit various downstream tasks.
This paper proposes TimeSiam as a simple but effective self-supervised pre-training framework for Time series based on Siamese networks.
arXiv Detail & Related papers (2024-02-04T13:10:51Z) - Graph Spatiotemporal Process for Multivariate Time Series Anomaly
Detection with Missing Values [67.76168547245237]
We introduce a novel framework called GST-Pro, which utilizes a graphtemporal process and anomaly scorer to detect anomalies.
Our experimental results show that the GST-Pro method can effectively detect anomalies in time series data and outperforms state-of-the-art methods.
arXiv Detail & Related papers (2024-01-11T10:10:16Z) - Continual Release of Differentially Private Synthetic Data from Longitudinal Data Collections [19.148874215745135]
We study the problem of continually releasing differentially private synthetic data from longitudinal data collections.
We introduce a model where, in every time step, each individual reports a new data element.
We give continual synthetic data generation algorithms that preserve two basic types of queries.
arXiv Detail & Related papers (2023-06-13T16:22:08Z) - TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series [61.436361263605114]
Time series data are often scarce or highly sensitive, which precludes the sharing of data between researchers and industrial organizations.
We introduce Time Series Generative Modeling (TSGM), an open-source framework for the generative modeling of synthetic time series.
arXiv Detail & Related papers (2023-05-19T10:11:21Z) - Synthcity: facilitating innovative use cases of synthetic data in
different data modalities [86.52703093858631]
Synthcity is an open-source software package for innovative use cases of synthetic data in ML fairness, privacy and augmentation.
Synthcity provides the practitioners with a single access point to cutting edge research and tools in synthetic data.
arXiv Detail & Related papers (2023-01-18T14:49:54Z) - HyperTime: Implicit Neural Representation for Time Series [131.57172578210256]
Implicit neural representations (INRs) have recently emerged as a powerful tool that provides an accurate and resolution-independent encoding of data.
In this paper, we analyze the representation of time series using INRs, comparing different activation functions in terms of reconstruction accuracy and training convergence speed.
We propose a hypernetwork architecture that leverages INRs to learn a compressed latent representation of an entire time series dataset.
arXiv Detail & Related papers (2022-08-11T14:05:51Z) - Deep Time Series Models for Scarce Data [8.673181404172963]
Time series data have grown at an explosive rate in numerous domains and have stimulated a surge of time series modeling research.
Data scarcity is a universal issue that occurs in a vast range of data analytics problems.
arXiv Detail & Related papers (2021-03-16T22:16:54Z) - Learning summary features of time series for likelihood free inference [93.08098361687722]
We present a data-driven strategy for automatically learning summary features from time series data.
Our results indicate that learning summary features from data can compete and even outperform LFI methods based on hand-crafted values.
arXiv Detail & Related papers (2020-12-04T19:21:37Z) - Time Series Data Imputation: A Survey on Deep Learning Approaches [4.4458738910060775]
Time series data imputation is a well-studied problem with different categories of methods.
Time series methods based on deep learning have made progress with the usage of models like RNN.
We will review and discuss their model architectures, their pros and cons as well as their effects to show the development of the time series imputation methods.
arXiv Detail & Related papers (2020-11-23T11:57:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.