Calculating the matrix profile from noisy data
- URL: http://arxiv.org/abs/2306.10151v1
- Date: Fri, 16 Jun 2023 19:41:07 GMT
- Title: Calculating the matrix profile from noisy data
- Authors: Colin Hehir and Alan F. Smeaton
- Abstract summary: The matrix profile (MP) is a data structure computed from a time series which encodes the data required to locate motifs and discords.
We measure the similarities between the MP from original time series data with MPs generated from the same data with noisy data added.
Results suggest that MP generation is resilient to a small amount of noise being introduced into the data but as the amount of noise increases this resilience disappears.
- Score: 3.236217153362305
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The matrix profile (MP) is a data structure computed from a time series which
encodes the data required to locate motifs and discords, corresponding to
recurring patterns and outliers respectively. When the time series contains
noisy data then the conventional approach is to pre-filter it in order to
remove noise but this cannot apply in unsupervised settings where patterns and
outliers are not annotated. The resilience of the algorithm used to generate
the MP when faced with noisy data remains unknown. We measure the similarities
between the MP from original time series data with MPs generated from the same
data with noisy data added under a range of parameter settings including adding
duplicates and adding irrelevant data. We use three real world data sets drawn
from diverse domains for these experiments Based on dissimilarities between the
MPs, our results suggest that MP generation is resilient to a small amount of
noise being introduced into the data but as the amount of noise increases this
resilience disappears
Related papers
- Noise Variance Optimization in Differential Privacy: A Game-Theoretic Approach Through Per-Instance Differential Privacy [7.264378254137811]
Differential privacy (DP) can measure privacy loss by observing the changes in the distribution caused by the inclusion of individuals in the target dataset.
DP has been prominent in safeguarding datasets in machine learning in industry giants like Apple and Google.
We propose per-instance DP (pDP) as a constraint, measuring privacy loss for each data instance and optimizing noise tailored to individual instances.
arXiv Detail & Related papers (2024-04-24T06:51:16Z) - SoftPatch: Unsupervised Anomaly Detection with Noisy Data [67.38948127630644]
This paper considers label-level noise in image sensory anomaly detection for the first time.
We propose a memory-based unsupervised AD method, SoftPatch, which efficiently denoises the data at the patch level.
Compared with existing methods, SoftPatch maintains a strong modeling ability of normal data and alleviates the overconfidence problem in coreset.
arXiv Detail & Related papers (2024-03-21T08:49:34Z) - Time Series Synthesis Using the Matrix Profile for Anonymization [32.22243483781984]
Many researchers cannot release their data due to privacy regulations or fear of leaking confidential business information.
We propose the Time Series Synthesis Using the Matrix Profile (TSSUMP) method, where synthesized time series can be released in lieu of the original data.
We test our method on a case study of ECG and gender masking prediction.
arXiv Detail & Related papers (2023-11-05T04:27:24Z) - SEAM: Searching Transferable Mixed-Precision Quantization Policy through
Large Margin Regularization [50.04951511146338]
Mixed-precision quantization (MPQ) suffers from the time-consuming process of searching the optimal bit-width allocation for each layer.
This paper proposes a novel method for efficiently searching for effective MPQ policies using a small proxy dataset.
arXiv Detail & Related papers (2023-02-14T05:47:45Z) - PMP: Privacy-Aware Matrix Profile against Sensitive Pattern Inference
for Time Series [12.855499575586753]
We propose a new privacy-preserving problem: preventing malicious inference on long shape-based patterns.
We find that while Matrix Profile (MP) can prevent concrete shape leakage, the canonical correlation in MP index can still reveal the location of sensitive long pattern.
We propose a Privacy-Aware Matrix Profile (PMP) via perturbing the local correlation and breaking the canonical correlation in MP index vector.
arXiv Detail & Related papers (2023-01-04T22:11:38Z) - Randomized Quantization: A Generic Augmentation for Data Agnostic
Self-supervised Learning [89.00646449740606]
Self-supervised representation learning follows a paradigm of withholding some part of the data and tasking the network to predict it from the remaining part.
Data augmentation lies at the core for creating the information gap.
In this paper, we explore the channel dimension for generic data augmentation by exploiting precision redundancy.
arXiv Detail & Related papers (2022-12-19T18:59:57Z) - Noise-Aware Statistical Inference with Differentially Private Synthetic
Data [0.0]
We show that simply analysing DP synthetic data as if it were real does not produce valid inferences of population-level quantities.
We tackle this problem by combining synthetic data analysis techniques from the field of multiple imputation, and synthetic data generation.
We develop a novel noise-aware synthetic data generation algorithm NAPSU-MQ using the principle of maximum entropy.
arXiv Detail & Related papers (2022-05-28T16:59:46Z) - Noise-Resistant Deep Metric Learning with Probabilistic Instance
Filtering [59.286567680389766]
Noisy labels are commonly found in real-world data, which cause performance degradation of deep neural networks.
We propose Probabilistic Ranking-based Instance Selection with Memory (PRISM) approach for DML.
PRISM calculates the probability of a label being clean, and filters out potentially noisy samples.
arXiv Detail & Related papers (2021-08-03T12:15:25Z) - Noise-resistant Deep Metric Learning with Ranking-based Instance
Selection [59.286567680389766]
We propose a noise-resistant training technique for DML, which we name Probabilistic Ranking-based Instance Selection with Memory (PRISM)
PRISM identifies noisy data in a minibatch using average similarity against image features extracted from several previous versions of the neural network.
To alleviate the high computational cost brought by the memory bank, we introduce an acceleration method that replaces individual data points with the class centers.
arXiv Detail & Related papers (2021-03-30T03:22:17Z) - Improving Face Recognition by Clustering Unlabeled Faces in the Wild [77.48677160252198]
We propose a novel identity separation method based on extreme value theory.
It greatly reduces the problems caused by overlapping-identity label noise.
Experiments on both controlled and real settings demonstrate our method's consistent improvements.
arXiv Detail & Related papers (2020-07-14T12:26:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.