Outlier detection in multivariate functional data through a contaminated
mixture model
- URL: http://arxiv.org/abs/2106.07222v1
- Date: Mon, 14 Jun 2021 08:17:42 GMT
- Title: Outlier detection in multivariate functional data through a contaminated
mixture model
- Authors: Martial Amovin-Assagba (ERIC, AMK), Ir\`ene Gannaz, Julien Jacques
(ERIC)
- Abstract summary: This work is motivated by an application in an industrial context, where the activity of sensors is recorded at a high frequency.
The objective is to automatically detect abnormal measurement behaviour.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work is motivated by an application in an industrial context, where the
activity of sensors is recorded at a high frequency. The objective is to
automatically detect abnormal measurement behaviour. Considering the sensor
measures as functional data, we are formally interested in detecting outliers
in a multivariate functional data set. Due to the heterogeneity of this data
set, the proposed contaminated mixture model both clusters the multivariate
functional data into homogeneous groups and detects outliers. The main
advantage of this procedure over its competitors is that it does not require us
to specify the proportion of outliers. Model inference is performed through an
Expectation-Conditional Maximization algorithm, and the BIC criterion is used
to select the number of clusters. Numerical experiments on simulated data
demonstrate the high performance achieved by the inference algorithm. In
particular, the proposed model outperforms competitors. Its application on the
real data which motivated this study allows us to correctly detect abnormal
behaviours.
Related papers
- Causal Feature Selection via Transfer Entropy [59.999594949050596]
Causal discovery aims to identify causal relationships between features with observational data.
We introduce a new causal feature selection approach that relies on the forward and backward feature selection procedures.
We provide theoretical guarantees on the regression and classification errors for both the exact and the finite-sample cases.
arXiv Detail & Related papers (2023-10-17T08:04:45Z) - On the Universal Adversarial Perturbations for Efficient Data-free
Adversarial Detection [55.73320979733527]
We propose a data-agnostic adversarial detection framework, which induces different responses between normal and adversarial samples to UAPs.
Experimental results show that our method achieves competitive detection performance on various text classification tasks.
arXiv Detail & Related papers (2023-06-27T02:54:07Z) - Fast kernel methods for Data Quality Monitoring as a goodness-of-fit
test [10.882743697472755]
We propose a machine learning approach for monitoring particle detectors in real-time.
The goal is to assess the compatibility of incoming experimental data with a reference dataset, characterising the data behaviour under normal circumstances.
The model is based on a modern implementation of kernel methods, nonparametric algorithms that can learn any continuous function given enough data.
arXiv Detail & Related papers (2023-03-09T16:59:35Z) - Learning to Bound Counterfactual Inference in Structural Causal Models
from Observational and Randomised Data [64.96984404868411]
We derive a likelihood characterisation for the overall data that leads us to extend a previous EM-based algorithm.
The new algorithm learns to approximate the (unidentifiability) region of model parameters from such mixed data sources.
It delivers interval approximations to counterfactual results, which collapse to points in the identifiable case.
arXiv Detail & Related papers (2022-12-06T12:42:11Z) - Multimodal Data Fusion in High-Dimensional Heterogeneous Datasets via
Generative Models [16.436293069942312]
We are interested in learning probabilistic generative models from high-dimensional heterogeneous data in an unsupervised fashion.
We propose a general framework that combines disparate data types through the exponential family of distributions.
The proposed algorithm is presented in detail for the commonly encountered heterogeneous datasets with real-valued (Gaussian) and categorical (multinomial) features.
arXiv Detail & Related papers (2021-08-27T18:10:31Z) - ESAD: End-to-end Deep Semi-supervised Anomaly Detection [85.81138474858197]
We propose a new objective function that measures the KL-divergence between normal and anomalous data.
The proposed method significantly outperforms several state-of-the-arts on multiple benchmark datasets.
arXiv Detail & Related papers (2020-12-09T08:16:35Z) - Dynamic Bayesian Approach for decision-making in Ego-Things [8.577234269009042]
This paper presents a novel approach to detect abnormalities in dynamic systems based on multisensory data and feature selection.
Growing neural gas (GNG) is employed for clustering multisensory data into a set of nodes.
Our method uses a Markov Jump particle filter (MJPF) for state estimation and abnormality detection.
arXiv Detail & Related papers (2020-10-28T11:38:51Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Categorical anomaly detection in heterogeneous data using minimum
description length clustering [3.871148938060281]
We propose a meta-algorithm for enhancing any MDL-based anomaly detection model to deal with heterogeneous data.
Our experimental results show that using a discrete mixture model provides competitive performance relative to two previous anomaly detection algorithms.
arXiv Detail & Related papers (2020-06-14T14:48:37Z) - A Causal Direction Test for Heterogeneous Populations [10.653162005300608]
Most causal models assume a single homogeneous population, an assumption that may fail to hold in many applications.
We show that when the homogeneity assumption is violated, causal models developed based on such assumption can fail to identify the correct causal direction.
We propose an adjustment to a commonly used causal direction test statistic by using a $k$-means type clustering algorithm.
arXiv Detail & Related papers (2020-06-08T18:59:14Z) - Unsupervised Anomaly Detection with Adversarial Mirrored AutoEncoders [51.691585766702744]
We propose a variant of Adversarial Autoencoder which uses a mirrored Wasserstein loss in the discriminator to enforce better semantic-level reconstruction.
We put forward an alternative measure of anomaly score to replace the reconstruction-based metric.
Our method outperforms the current state-of-the-art methods for anomaly detection on several OOD detection benchmarks.
arXiv Detail & Related papers (2020-03-24T08:26:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.