Related papers: Calculating the matrix profile from noisy data

Calculating the matrix profile from noisy data

URL: http://arxiv.org/abs/2306.10151v1
Date: Fri, 16 Jun 2023 19:41:07 GMT
Title: Calculating the matrix profile from noisy data
Authors: Colin Hehir and Alan F. Smeaton
Abstract summary: The matrix profile (MP) is a data structure computed from a time series which encodes the data required to locate motifs and discords. We measure the similarities between the MP from original time series data with MPs generated from the same data with noisy data added. Results suggest that MP generation is resilient to a small amount of noise being introduced into the data but as the amount of noise increases this resilience disappears.
Score: 3.236217153362305
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The matrix profile (MP) is a data structure computed from a time series which encodes the data required to locate motifs and discords, corresponding to recurring patterns and outliers respectively. When the time series contains noisy data then the conventional approach is to pre-filter it in order to remove noise but this cannot apply in unsupervised settings where patterns and outliers are not annotated. The resilience of the algorithm used to generate the MP when faced with noisy data remains unknown. We measure the similarities between the MP from original time series data with MPs generated from the same data with noisy data added under a range of parameter settings including adding duplicates and adding irrelevant data. We use three real world data sets drawn from diverse domains for these experiments Based on dissimilarities between the MPs, our results suggest that MP generation is resilient to a small amount of noise being introduced into the data but as the amount of noise increases this resilience disappears

Related papers

DP-$λ$CGD: Efficient Noise Correlation for Differentially Private Model Training [30.807442477789447]
We propose a new noise correlation strategy that correlates noise only with the immediately preceding iteration and cancels a controlled portion of it.<n>Our method relies on noise regeneration using a pseudorandom noise generator, eliminating the need to store past noise.<n>We show that the computational overhead is minimal and empirically demonstrate improved accuracy over DP-SGD.
arXiv Detail & Related papers (2026-01-29T21:21:34Z)
Benchmarking Fraud Detectors on Private Graph Data [70.4654745317714]
Currently, many types of fraud are managed in part by automated detection algorithms that operate over graphs.<n>We consider the scenario where a data holder wishes to outsource development of fraud detectors to third parties.<n>Third parties submit their fraud detectors to the data holder, who evaluates these algorithms on a private dataset and then publicly communicates the results.<n>We propose a realistic privacy attack on this system that allows an adversary to de-anonymize individuals' data based only on the evaluation results.
arXiv Detail & Related papers (2025-07-30T03:20:15Z)
Clustering and Median Aggregation Improve Differentially Private Inference [19.7873954143387]
Differentially private (DP) language model inference is an approach for generating private synthetic text.<n>We show that uniform sampling degrades the quality of privately generated text.<n>We introduce a new algorithm that aggregates next token statistics by privately computing medians instead of averages.
arXiv Detail & Related papers (2025-06-05T02:34:50Z)
Missing data imputation for noisy time-series data and applications in healthcare [5.586166090905021]
Imputation, i.e., filling in the missing values, is a common way to deal with noisy, missing time series data. In this study, we compare imputation methods, including Multiple Imputation with Random Forest (MICE-RF) and advanced deep learning approaches. Our results show that MICE-RF can effectively impute missing data compared to deep learning methods.
arXiv Detail & Related papers (2024-12-15T12:23:20Z)
Noise Variance Optimization in Differential Privacy: A Game-Theoretic Approach Through Per-Instance Differential Privacy [7.264378254137811]
Differential privacy (DP) can measure privacy loss by observing the changes in the distribution caused by the inclusion of individuals in the target dataset. DP has been prominent in safeguarding datasets in machine learning in industry giants like Apple and Google. We propose per-instance DP (pDP) as a constraint, measuring privacy loss for each data instance and optimizing noise tailored to individual instances.
arXiv Detail & Related papers (2024-04-24T06:51:16Z)
SoftPatch: Unsupervised Anomaly Detection with Noisy Data [67.38948127630644]
This paper considers label-level noise in image sensory anomaly detection for the first time. We propose a memory-based unsupervised AD method, SoftPatch, which efficiently denoises the data at the patch level. Compared with existing methods, SoftPatch maintains a strong modeling ability of normal data and alleviates the overconfidence problem in coreset.
arXiv Detail & Related papers (2024-03-21T08:49:34Z)
Time Series Synthesis Using the Matrix Profile for Anonymization [32.22243483781984]
Many researchers cannot release their data due to privacy regulations or fear of leaking confidential business information. We propose the Time Series Synthesis Using the Matrix Profile (TSSUMP) method, where synthesized time series can be released in lieu of the original data. We test our method on a case study of ECG and gender masking prediction.
arXiv Detail & Related papers (2023-11-05T04:27:24Z)
SEAM: Searching Transferable Mixed-Precision Quantization Policy through Large Margin Regularization [50.04951511146338]
Mixed-precision quantization (MPQ) suffers from the time-consuming process of searching the optimal bit-width allocation for each layer. This paper proposes a novel method for efficiently searching for effective MPQ policies using a small proxy dataset.
arXiv Detail & Related papers (2023-02-14T05:47:45Z)
PMP: Privacy-Aware Matrix Profile against Sensitive Pattern Inference for Time Series [12.855499575586753]
We propose a new privacy-preserving problem: preventing malicious inference on long shape-based patterns. We find that while Matrix Profile (MP) can prevent concrete shape leakage, the canonical correlation in MP index can still reveal the location of sensitive long pattern. We propose a Privacy-Aware Matrix Profile (PMP) via perturbing the local correlation and breaking the canonical correlation in MP index vector.
arXiv Detail & Related papers (2023-01-04T22:11:38Z)
Randomized Quantization: A Generic Augmentation for Data Agnostic Self-supervised Learning [89.00646449740606]
Self-supervised representation learning follows a paradigm of withholding some part of the data and tasking the network to predict it from the remaining part. Data augmentation lies at the core for creating the information gap. In this paper, we explore the channel dimension for generic data augmentation by exploiting precision redundancy.
arXiv Detail & Related papers (2022-12-19T18:59:57Z)
Noise-Aware Statistical Inference with Differentially Private Synthetic Data [0.0]
We show that simply analysing DP synthetic data as if it were real does not produce valid inferences of population-level quantities. We tackle this problem by combining synthetic data analysis techniques from the field of multiple imputation, and synthetic data generation. We develop a novel noise-aware synthetic data generation algorithm NAPSU-MQ using the principle of maximum entropy.
arXiv Detail & Related papers (2022-05-28T16:59:46Z)
Noise-Resistant Deep Metric Learning with Probabilistic Instance Filtering [59.286567680389766]
Noisy labels are commonly found in real-world data, which cause performance degradation of deep neural networks. We propose Probabilistic Ranking-based Instance Selection with Memory (PRISM) approach for DML. PRISM calculates the probability of a label being clean, and filters out potentially noisy samples.
arXiv Detail & Related papers (2021-08-03T12:15:25Z)
Noise-resistant Deep Metric Learning with Ranking-based Instance Selection [59.286567680389766]
We propose a noise-resistant training technique for DML, which we name Probabilistic Ranking-based Instance Selection with Memory (PRISM) PRISM identifies noisy data in a minibatch using average similarity against image features extracted from several previous versions of the neural network. To alleviate the high computational cost brought by the memory bank, we introduce an acceleration method that replaces individual data points with the class centers.
arXiv Detail & Related papers (2021-03-30T03:22:17Z)
Improving Face Recognition by Clustering Unlabeled Faces in the Wild [77.48677160252198]
We propose a novel identity separation method based on extreme value theory. It greatly reduces the problems caused by overlapping-identity label noise. Experiments on both controlled and real settings demonstrate our method's consistent improvements.
arXiv Detail & Related papers (2020-07-14T12:26:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.