Novelty Detection in Sequential Data by Informed Clustering and Modeling
- URL: http://arxiv.org/abs/2103.03943v2
- Date: Mon, 10 Jul 2023 10:03:17 GMT
- Title: Novelty Detection in Sequential Data by Informed Clustering and Modeling
- Authors: Linara Adilova, Siming Chen, Michael Kamp
- Abstract summary: Novelties can be detected by modeling normal sequences and measuring the deviations of a new sequence from the model predictions.
In this paper, we adapt a state-of-the-art visual analytics tool for discrete sequence clustering to obtain informed clusters from domain experts.
Our approach outperforms state-of-the-art novelty detection methods for discrete sequences in three real-world application scenarios.
- Score: 8.108571247838206
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Novelty detection in discrete sequences is a challenging task, since
deviations from the process generating the normal data are often small or
intentionally hidden. Novelties can be detected by modeling normal sequences
and measuring the deviations of a new sequence from the model predictions.
However, in many applications data is generated by several distinct processes
so that models trained on all the data tend to over-generalize and novelties
remain undetected. We propose to approach this challenge through decomposition:
by clustering the data we break down the problem, obtaining simpler modeling
task in each cluster which can be modeled more accurately. However, this comes
at a trade-off, since the amount of training data per cluster is reduced. This
is a particular problem for discrete sequences where state-of-the-art models
are data-hungry. The success of this approach thus depends on the quality of
the clustering, i.e., whether the individual learning problems are sufficiently
simpler than the joint problem. While clustering discrete sequences
automatically is a challenging and domain-specific task, it is often easy for
human domain experts, given the right tools. In this paper, we adapt a
state-of-the-art visual analytics tool for discrete sequence clustering to
obtain informed clusters from domain experts and use LSTMs to model each
cluster individually. Our extensive empirical evaluation indicates that this
informed clustering outperforms automatic ones and that our approach
outperforms state-of-the-art novelty detection methods for discrete sequences
in three real-world application scenarios. In particular, decomposition
outperforms a global model despite less training data on each individual
cluster.
Related papers
- Anomaly Detection by Context Contrasting [57.695202846009714]
Anomaly detection focuses on identifying samples that deviate from the norm.
Recent advances in self-supervised learning have shown great promise in this regard.
We propose Con$$, which learns through context augmentations.
arXiv Detail & Related papers (2024-05-29T07:59:06Z) - Task Groupings Regularization: Data-Free Meta-Learning with Heterogeneous Pre-trained Models [83.02797560769285]
Data-Free Meta-Learning (DFML) aims to derive knowledge from a collection of pre-trained models without accessing their original data.
Current methods often overlook the heterogeneity among pre-trained models, which leads to performance degradation due to task conflicts.
We propose Task Groupings Regularization, a novel approach that benefits from model heterogeneity by grouping and aligning conflicting tasks.
arXiv Detail & Related papers (2024-05-26T13:11:55Z) - Time Series Data Augmentation as an Imbalanced Learning Problem [2.5536554335016417]
We use oversampling strategies to create synthetic time series observations and improve the accuracy of forecasting models.
We carried out experiments using 7 different databases that contain a total of 5502 univariate time series.
We found that the proposed solution outperforms both a global and a local model, thus providing a better trade-off between these two approaches.
arXiv Detail & Related papers (2024-04-29T09:27:15Z) - CDIMC-net: Cognitive Deep Incomplete Multi-view Clustering Network [53.72046586512026]
We propose a novel incomplete multi-view clustering network, called Cognitive Deep Incomplete Multi-view Clustering Network (CDIMC-net)
It captures the high-level features and local structure of each view by incorporating the view-specific deep encoders and graph embedding strategy into a framework.
Based on the human cognition, i.e., learning from easy to hard, it introduces a self-paced strategy to select the most confident samples for model training.
arXiv Detail & Related papers (2024-03-28T15:45:03Z) - Anonymous Learning via Look-Alike Clustering: A Precise Analysis of
Model Generalization [18.03833857491361]
A common approach to enhancing privacy involves training models using anonymous data rather than individual data.
We provide an analysis of how training models using anonymous cluster centers affects their generalization capabilities.
In certain high-dimensional regimes, training over anonymous cluster centers acts as a regularization and improves generalization error of the trained models.
arXiv Detail & Related papers (2023-10-06T04:52:46Z) - Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - Low-count Time Series Anomaly Detection [1.3207844222875191]
Low-count time series describe sparse or intermittent events, which are prevalent in large-scale online platforms that capture and monitor diverse data types.
Several distinct challenges surface when modelling low-count time series, particularly low signal-to-noise ratios.
We introduce a novel generative procedure for creating benchmark datasets comprising of low-count time series with anomalous segments.
arXiv Detail & Related papers (2023-08-24T16:58:30Z) - Hard Regularization to Prevent Deep Online Clustering Collapse without
Data Augmentation [65.268245109828]
Online deep clustering refers to the joint use of a feature extraction network and a clustering model to assign cluster labels to each new data point or batch as it is processed.
While faster and more versatile than offline methods, online clustering can easily reach the collapsed solution where the encoder maps all inputs to the same point and all are put into a single cluster.
We propose a method that does not require data augmentation, and that, differently from existing methods, regularizes the hard assignments.
arXiv Detail & Related papers (2023-03-29T08:23:26Z) - Unified Multi-View Orthonormal Non-Negative Graph Based Clustering
Framework [74.25493157757943]
We formulate a novel clustering model, which exploits the non-negative feature property and incorporates the multi-view information into a unified joint learning framework.
We also explore, for the first time, the multi-model non-negative graph-based approach to clustering data based on deep features.
arXiv Detail & Related papers (2022-11-03T08:18:27Z) - Mind Your Clever Neighbours: Unsupervised Person Re-identification via
Adaptive Clustering Relationship Modeling [19.532602887109668]
Unsupervised person re-identification (Re-ID) attracts increasing attention due to its potential to resolve the scalability problem of supervised Re-ID models.
Most existing unsupervised methods adopt an iterative clustering mechanism, where the network was trained based on pseudo labels generated by unsupervised clustering.
To generate high-quality pseudo-labels and mitigate the impact of clustering errors, we propose a novel clustering relationship modeling framework for unsupervised person Re-ID.
arXiv Detail & Related papers (2021-12-03T10:55:07Z) - Modeling Heterogeneous Statistical Patterns in High-dimensional Data by
Adversarial Distributions: An Unsupervised Generative Framework [33.652544673163774]
We propose a novel unsupervised generative framework called FIRD, which utilizes adversarial distributions to fit and disentangle the heterogeneous statistical patterns.
When applying to discrete spaces, FIRD effectively distinguishes the synchronized fraudsters from normal users.
arXiv Detail & Related papers (2020-12-15T08:51:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.