Multiple Instance Learning for Detecting Anomalies over Sequential
Real-World Datasets
- URL: http://arxiv.org/abs/2210.01707v1
- Date: Tue, 4 Oct 2022 16:02:09 GMT
- Title: Multiple Instance Learning for Detecting Anomalies over Sequential
Real-World Datasets
- Authors: Parastoo Kamranfar, David Lattanzi, Amarda Shehu, Daniel Barbar\'a
- Abstract summary: Multiple Instance Learning (MIL) has been shown effective on problems with incomplete knowledge of labels in the training dataset.
We propose an MIL-based formulation and various algorithmic instantiations of this framework based on different design decisions.
The framework generalizes well over diverse datasets resulting from different real-world application domains.
- Score: 2.427831679672374
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Detecting anomalies over real-world datasets remains a challenging task. Data
annotation is an intensive human labor problem, particularly in sequential
datasets, where the start and end time of anomalies are not known. As a result,
data collected from sequential real-world processes can be largely unlabeled or
contain inaccurate labels. These characteristics challenge the application of
anomaly detection techniques based on supervised learning. In contrast,
Multiple Instance Learning (MIL) has been shown effective on problems with
incomplete knowledge of labels in the training dataset, mainly due to the
notion of bags. While largely under-leveraged for anomaly detection, MIL
provides an appealing formulation for anomaly detection over real-world
datasets, and it is the primary contribution of this paper. In this paper, we
propose an MIL-based formulation and various algorithmic instantiations of this
framework based on different design decisions for key components of the
framework. We evaluate the resulting algorithms over four datasets that capture
different physical processes along different modalities. The experimental
evaluation draws out several observations. The MIL-based formulation performs
no worse than single instance learning on easy to moderate datasets and
outperforms single-instance learning on more challenging datasets. Altogether,
the results show that the framework generalizes well over diverse datasets
resulting from different real-world application domains.
Related papers
- Anomaly Detection by Context Contrasting [57.695202846009714]
Anomaly Detection focuses on identifying samples that deviate from the norm.
Recent advances in self-supervised learning have shown great promise in this regard.
We propose Con2, which addresses this problem by setting normal training data into distinct contexts.
Our approach achieves state-of-the-art performance on various benchmarks while exhibiting superior performance in a more realistic healthcare setting.
arXiv Detail & Related papers (2024-05-29T07:59:06Z) - ARC: A Generalist Graph Anomaly Detector with In-Context Learning [62.202323209244]
ARC is a generalist GAD approach that enables a one-for-all'' GAD model to detect anomalies across various graph datasets on-the-fly.
equipped with in-context learning, ARC can directly extract dataset-specific patterns from the target dataset.
Extensive experiments on multiple benchmark datasets from various domains demonstrate the superior anomaly detection performance, efficiency, and generalizability of ARC.
arXiv Detail & Related papers (2024-05-27T02:42:33Z) - Contrastive Multiple Instance Learning for Weakly Supervised Person ReID [50.04900262181093]
We introduce Contrastive Multiple Instance Learning (CMIL), a novel framework tailored for more effective weakly supervised ReID.
CMIL distinguishes itself by requiring only a single model and no pseudo labels while leveraging contrastive losses.
We release the WL-MUDD dataset, an extension of the MUDD dataset featuring naturally occurring weak labels from the real-world application at PerformancePhoto.co.
arXiv Detail & Related papers (2024-02-12T14:48:31Z) - Binary Quantification and Dataset Shift: An Experimental Investigation [54.14283123210872]
Quantification is the supervised learning task that consists of training predictors of the class prevalence values of sets of unlabelled data.
The relationship between quantification and other types of dataset shift remains, by and large, unexplored.
We propose a fine-grained taxonomy of types of dataset shift, by establishing protocols for the generation of datasets affected by these types of shift.
arXiv Detail & Related papers (2023-10-06T20:11:27Z) - Adversarial Deep Feature Extraction Network for User Independent Human
Activity Recognition [4.988898367111902]
We present an adversarial subject-independent feature extraction method with the maximum mean discrepancy (MMD) regularization for human activity recognition.
We evaluate the method on well-known public data sets showing that it significantly improves user-independent performance and reduces variance in results.
arXiv Detail & Related papers (2021-10-23T07:50:32Z) - Multiscale Laplacian Learning [3.24029503704305]
This paper presents two innovative multiscale Laplacian learning approaches for machine learning tasks.
The first approach, called multi Kernel manifold learning (MML), integrates manifold learning with multi Kernel information.
The second approach, called the multiscale MBO (MMBO) method, introduces multiscale Laplacians to a modification of the famous classical Merriman-Bence-Osher scheme.
arXiv Detail & Related papers (2021-09-08T15:25:32Z) - Meta-learning One-class Classifiers with Eigenvalue Solvers for
Supervised Anomaly Detection [55.888835686183995]
We propose a neural network-based meta-learning method for supervised anomaly detection.
We experimentally demonstrate that the proposed method achieves better performance than existing anomaly detection and few-shot learning methods.
arXiv Detail & Related papers (2021-03-01T01:43:04Z) - Comparative Analysis of Extreme Verification Latency Learning Algorithms [3.3439097577935213]
This paper is a comprehensive survey and comparative analysis of some of the EVL algorithms to point out the weaknesses and strengths of different approaches.
This work is a very first effort to provide a review of some of the existing algorithms in this field to the research community.
arXiv Detail & Related papers (2020-11-26T16:34:56Z) - Out-Of-Bag Anomaly Detection [0.9449650062296822]
Data anomalies are ubiquitous in real world datasets, and can have an adverse impact on machine learning (ML) systems.
We propose a novel model-based anomaly detection method, that we call Out-of-Bag anomaly detection.
We show our method can improve the accuracy and reliability of an ML system as data pre-processing step via a case study on home valuation.
arXiv Detail & Related papers (2020-09-20T06:01:52Z) - Toward Deep Supervised Anomaly Detection: Reinforcement Learning from
Partially Labeled Anomaly Data [150.9270911031327]
We consider the problem of anomaly detection with a small set of partially labeled anomaly examples and a large-scale unlabeled dataset.
Existing related methods either exclusively fit the limited anomaly examples that typically do not span the entire set of anomalies, or proceed with unsupervised learning from the unlabeled data.
We propose here instead a deep reinforcement learning-based approach that enables an end-to-end optimization of the detection of both labeled and unlabeled anomalies.
arXiv Detail & Related papers (2020-09-15T03:05:39Z) - Meta Learning for Causal Direction [29.00522306460408]
We introduce a novel generative model that allows distinguishing cause and effect in the small data setting.
We demonstrate our method on various synthetic as well as real-world data and show that it is able to maintain high accuracy in detecting directions across varying dataset sizes.
arXiv Detail & Related papers (2020-07-06T15:12:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.