Related papers: TiWS-iForest: Isolation Forest in Weakly Supervised and Tiny ML scenarios

TiWS-iForest: Isolation Forest in Weakly Supervised and Tiny ML scenarios

URL: http://arxiv.org/abs/2111.15432v1
Date: Tue, 30 Nov 2021 14:24:27 GMT
Title: TiWS-iForest: Isolation Forest in Weakly Supervised and Tiny ML scenarios
Authors: Tommaso Barbariol and Gian Antonio Susto
Abstract summary: Isolation Forest is a popular algorithm able to define an anomaly score by means of an ensemble of peculiar trees called isolation trees. We show that the standard algorithm might be improved in terms of memory requirements, latency and performances. We propose TiWS-iForest, an approach that, by leveraging weak supervision, is able to reduce Isolation Forest complexity and to enhance detection performances.
Score: 2.7285752469525315
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Unsupervised anomaly detection tackles the problem of finding anomalies inside datasets without the labels availability; since data tagging is typically hard or expensive to obtain, such approaches have seen huge applicability in recent years. In this context, Isolation Forest is a popular algorithm able to define an anomaly score by means of an ensemble of peculiar trees called isolation trees. These are built using a random partitioning procedure that is extremely fast and cheap to train. However, we find that the standard algorithm might be improved in terms of memory requirements, latency and performances; this is of particular importance in low resources scenarios and in TinyML implementations on ultra-constrained microprocessors. Moreover, Anomaly Detection approaches currently do not take advantage of weak supervisions: being typically consumed in Decision Support Systems, feedback from the users, even if rare, can be a valuable source of information that is currently unexplored. Beside showing iForest training limitations, we propose here TiWS-iForest, an approach that, by leveraging weak supervision is able to reduce Isolation Forest complexity and to enhance detection performances. We showed the effectiveness of TiWS-iForest on real word datasets and we share the code in a public repository to enhance reproducibility.

Related papers

Labels Matter More Than Models: Quantifying the Benefit of Supervised Time Series Anomaly Detection [56.302586730134806]
Time series anomaly detection (TSAD) is a critical data mining task often constrained by label scarcity.<n>Current research predominantly focuses on Unsupervised Time-series Anomaly Detection.<n>This paper challenges the premise that architectural complexity is the optimal path for TSAD.
arXiv Detail & Related papers (2025-11-20T08:32:49Z)
Glocal Information Bottleneck for Time Series Imputation [70.41814118117311]
Time Series Imputation aims to recover missing values in temporal data.<n>Existing models typically optimize the point-wise reconstruction loss, focusing on recovering numerical values (local information)<n>We propose a new training paradigm, Glocal Information Bottleneck (Glocal-IB)
arXiv Detail & Related papers (2025-10-06T15:24:44Z)
Structure-based Anomaly Detection and Clustering [1.450405446885067]
Anomaly detection is a fundamental problem in domains such as healthcare, manufacturing, and cybersecurity.<n>This thesis proposes new unsupervised methods for anomaly detection in both structured and streaming data settings.
arXiv Detail & Related papers (2025-05-19T06:20:00Z)
A Dataset for Semantic Segmentation in the Presence of Unknowns [49.795683850385956]
Existing datasets allow evaluation of only knowns or unknowns - but not both.<n>We propose a novel anomaly segmentation dataset, ISSU, that features a diverse set of anomaly inputs from cluttered real-world environments.<n>The dataset is twice larger than existing anomaly segmentation datasets.
arXiv Detail & Related papers (2025-03-28T10:31:01Z)
CableInspect-AD: An Expert-Annotated Anomaly Detection Dataset [14.246172794156987]
$textitCableInspect-AD$ is a high-quality dataset created and annotated by domain experts from Hydro-Qu'ebec, a Canadian public utility. This dataset includes high-resolution images with challenging real-world anomalies, covering defects with varying severity levels. We present a comprehensive evaluation protocol based on cross-validation to assess models' performances.
arXiv Detail & Related papers (2024-09-30T14:50:13Z)
SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention [53.4441894198495]
Large language models (LLMs) now support extremely long context windows.<n>The quadratic complexity of vanilla attention results in significantly long Time-to-First-Token (TTFT) latency.<n>We propose SampleAttention, an adaptive structured and near-lossless sparse attention.
arXiv Detail & Related papers (2024-06-17T11:05:15Z)
A Comprehensive Library for Benchmarking Multi-class Visual Anomaly Detection [52.228708947607636]
This paper introduces a comprehensive visual anomaly detection benchmark, ADer, which is a modular framework for new methods. The benchmark includes multiple datasets from industrial and medical domains, implementing fifteen state-of-the-art methods and nine comprehensive metrics. We objectively reveal the strengths and weaknesses of different methods and provide insights into the challenges and future directions of multi-class visual anomaly detection.
arXiv Detail & Related papers (2024-06-05T13:40:07Z)
PeFAD: A Parameter-Efficient Federated Framework for Time Series Anomaly Detection [51.20479454379662]
We propose a. Federated Anomaly Detection framework named PeFAD with the increasing privacy concerns. We conduct extensive evaluations on four real datasets, where PeFAD outperforms existing state-of-the-art baselines by up to 28.74%.
arXiv Detail & Related papers (2024-06-04T13:51:08Z)
SoftPatch: Unsupervised Anomaly Detection with Noisy Data [67.38948127630644]
This paper considers label-level noise in image sensory anomaly detection for the first time. We propose a memory-based unsupervised AD method, SoftPatch, which efficiently denoises the data at the patch level. Compared with existing methods, SoftPatch maintains a strong modeling ability of normal data and alleviates the overconfidence problem in coreset.
arXiv Detail & Related papers (2024-03-21T08:49:34Z)
Hard Nominal Example-aware Template Mutual Matching for Industrial Anomaly Detection [74.9262846410559]
textbfHard Nominal textbfExample-aware textbfTemplate textbfMutual textbfMatching (HETMM) textitHETMM aims to construct a robust prototype-based decision boundary, which can precisely distinguish between hard-nominal examples and anomalies.
arXiv Detail & Related papers (2023-03-28T17:54:56Z)
Towards Sequence Utility Maximization under Utility Occupancy Measure [53.234101208024335]
In the database, although utility is a flexible criterion for each pattern, it is a more absolute criterion due to neglect of utility sharing. We first define utility occupancy on sequence data and raise the problem of High Utility-Occupancy Sequential Pattern Mining. An algorithm called Sequence Utility Maximization with Utility occupancy measure (SUMU) is proposed.
arXiv Detail & Related papers (2022-12-20T17:28:53Z)
Semi-Supervised Temporal Action Detection with Proposal-Free Masking [134.26292288193298]
We propose a novel Semi-supervised Temporal action detection model based on PropOsal-free Temporal mask (SPOT) SPOT outperforms state-of-the-art alternatives, often by a large margin.
arXiv Detail & Related papers (2022-07-14T16:58:47Z)
Active Learning-based Isolation Forest (ALIF): Enhancing Anomaly Detection in Decision Support Systems [2.922007656878633]
ALIF is a lightweight modification of the popular Isolation Forest that proved superior performances with respect to other state-of-art algorithms. The proposed approach is particularly appealing in the presence of a Decision Support System (DSS), a case that is increasingly popular in real-world scenarios.
arXiv Detail & Related papers (2022-07-08T14:36:38Z)
Deep Isolation Forest for Anomaly Detection [16.581154394513025]
Isolation forest (iForest) has been emerging as arguably the most popular anomaly detector in recent years. Our model achieves significant improvement over state-of-the-art isolation-based methods and deep detectors on datasets.
arXiv Detail & Related papers (2022-06-14T05:47:07Z)
WSSOD: A New Pipeline for Weakly- and Semi-Supervised Object Detection [75.80075054706079]
We propose a weakly- and semi-supervised object detection framework (WSSOD) An agent detector is first trained on a joint dataset and then used to predict pseudo bounding boxes on weakly-annotated images. The proposed framework demonstrates remarkable performance on PASCAL-VOC and MSCOCO benchmark, achieving a high performance comparable to those obtained in fully-supervised settings.
arXiv Detail & Related papers (2021-05-21T11:58:50Z)
Interpretable Anomaly Detection with Mondrian P{\'o}lya Forests on Data Streams [6.177270420667713]
Anomaly detection at scale is an extremely challenging problem of great practicality. Recent work has coalesced on variations of (random) $k$emphd-trees to summarise data for anomaly detection. These methods rely on ad-hoc score functions that are not easy to interpret. We contextualise these methods in a probabilistic framework which we call the Mondrian Polya Forest.
arXiv Detail & Related papers (2020-08-04T13:19:07Z)
Interpretable Anomaly Detection with DIFFI: Depth-based Isolation Forest Feature Importance [4.769747792846005]
Anomaly Detection is an unsupervised learning task aimed at detecting anomalous behaviours with respect to historical data. The Isolation Forest is one of the most commonly adopted algorithms in the field of Anomaly Detection. This paper proposes methods to define feature importance scores at both global and local level for the Isolation Forest.
arXiv Detail & Related papers (2020-07-21T22:19:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.