OptIForest: Optimal Isolation Forest for Anomaly Detection
- URL: http://arxiv.org/abs/2306.12703v2
- Date: Fri, 23 Jun 2023 04:30:52 GMT
- Title: OptIForest: Optimal Isolation Forest for Anomaly Detection
- Authors: Haolong Xiang, Xuyun Zhang, Hongsheng Hu, Lianyong Qi, Wanchun Dou,
Mark Dras, Amin Beheshti and Xiaolong Xu
- Abstract summary: A category based on the isolation forest mechanism stands out due to its simplicity, effectiveness, and efficiency.
In this paper, we establish a theory on isolation efficiency to answer the question and determine the optimal branching factor for an isolation tree.
Based on the theoretical underpinning, we design a practical optimal isolation forest OptIForest incorporating clustering based learning to hash.
- Score: 19.38817835115542
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Anomaly detection plays an increasingly important role in various fields for
critical tasks such as intrusion detection in cybersecurity, financial risk
detection, and human health monitoring. A variety of anomaly detection methods
have been proposed, and a category based on the isolation forest mechanism
stands out due to its simplicity, effectiveness, and efficiency, e.g., iForest
is often employed as a state-of-the-art detector for real deployment. While the
majority of isolation forests use the binary structure, a framework LSHiForest
has demonstrated that the multi-fork isolation tree structure can lead to
better detection performance. However, there is no theoretical work answering
the fundamentally and practically important question on the optimal tree
structure for an isolation forest with respect to the branching factor. In this
paper, we establish a theory on isolation efficiency to answer the question and
determine the optimal branching factor for an isolation tree. Based on the
theoretical underpinning, we design a practical optimal isolation forest
OptIForest incorporating clustering based learning to hash which enables more
information to be learned from data for better isolation quality. The rationale
of our approach relies on a better bias-variance trade-off achieved by bias
reduction in OptIForest. Extensive experiments on a series of benchmarking
datasets for comparative and ablation studies demonstrate that our approach can
efficiently and robustly achieve better detection performance in general than
the state-of-the-arts including the deep learning based methods.
Related papers
- The Best of Both Worlds: On the Dilemma of Out-of-distribution Detection [75.65876949930258]
Out-of-distribution (OOD) detection is essential for model trustworthiness.
We show that the superior OOD detection performance of state-of-the-art methods is achieved by secretly sacrificing the OOD generalization ability.
arXiv Detail & Related papers (2024-10-12T07:02:04Z) - Learning Deep Tree-based Retriever for Efficient Recommendation: Theory and Method [76.31185707649227]
We propose a Deep Tree-based Retriever (DTR) for efficient recommendation.
DTR frames the training task as a softmax-based multi-class classification over tree nodes at the same level.
To mitigate the suboptimality induced by the labeling of non-leaf nodes, we propose a rectification method for the loss function.
arXiv Detail & Related papers (2024-08-21T05:09:53Z) - A Satellite Band Selection Framework for Amazon Forest Deforestation Detection Task [0.5825410941577593]
Deforestation and degradation impact millions of hectares annually, necessitating government or private initiatives for effective forest monitoring.
This study introduces a novel framework that employs the Univariate Marginal Distribution Algorithm (UMDA) to select spectral bands from Landsat-8 satellite.
This selection guides a semantic segmentation architecture, DeepLabv3+, enhancing its performance.
arXiv Detail & Related papers (2024-04-03T11:47:20Z) - Multi-modal Causal Structure Learning and Root Cause Analysis [67.67578590390907]
We propose Mulan, a unified multi-modal causal structure learning method for root cause localization.
We leverage a log-tailored language model to facilitate log representation learning, converting log sequences into time-series data.
We also introduce a novel key performance indicator-aware attention mechanism for assessing modality reliability and co-learning a final causal graph.
arXiv Detail & Related papers (2024-02-04T05:50:38Z) - Revisiting Out-of-distribution Robustness in NLP: Benchmark, Analysis,
and LLMs Evaluations [111.88727295707454]
This paper reexamines the research on out-of-distribution (OOD) robustness in the field of NLP.
We propose a benchmark construction protocol that ensures clear differentiation and challenging distribution shifts.
We conduct experiments on pre-trained language models for analysis and evaluation of OOD robustness.
arXiv Detail & Related papers (2023-06-07T17:47:03Z) - AUTO: Adaptive Outlier Optimization for Online Test-Time OOD Detection [81.49353397201887]
Out-of-distribution (OOD) detection is crucial to deploying machine learning models in open-world applications.
We introduce a novel paradigm called test-time OOD detection, which utilizes unlabeled online data directly at test time to improve OOD detection performance.
We propose adaptive outlier optimization (AUTO), which consists of an in-out-aware filter, an ID memory bank, and a semantically-consistent objective.
arXiv Detail & Related papers (2023-03-22T02:28:54Z) - OpenOOD: Benchmarking Generalized Out-of-Distribution Detection [60.13300701826931]
Out-of-distribution (OOD) detection is vital to safety-critical machine learning applications.
The field currently lacks a unified, strictly formulated, and comprehensive benchmark.
We build a unified, well-structured called OpenOOD, which implements over 30 methods developed in relevant fields.
arXiv Detail & Related papers (2022-10-13T17:59:57Z) - What Makes Forest-Based Heterogeneous Treatment Effect Estimators Work? [1.1050303097572156]
We show that both methods can be understood in terms of the same parameters and confounding assumptions under L2 loss.
In the randomized setting, both approaches performed akin to the new blended versions in a benchmark study.
arXiv Detail & Related papers (2022-06-21T12:45:07Z) - Comparative Study Between Distance Measures On Supervised Optimum-Path
Forest Classification [0.0]
Optimum-Path Forest (OPF) uses a graph-based methodology and a distance measure to create arcs between nodes and hence sets of trees.
This work proposes a comparative study over a wide range of distance measures applied to the supervised Optimum-Path Forest classification.
arXiv Detail & Related papers (2022-02-08T13:34:09Z) - MOOD: Multi-level Out-of-distribution Detection [13.207044902083057]
Out-of-distribution (OOD) detection is essential to prevent anomalous inputs from causing a model to fail during deployment.
We propose a novel framework, multi-level out-of-distribution detection MOOD, which exploits intermediate classifier outputs for dynamic and efficient OOD inference.
MOOD achieves up to 71.05% computational reduction in inference, while maintaining competitive OOD detection performance.
arXiv Detail & Related papers (2021-04-30T02:18:31Z) - Interpretable Anomaly Detection with DIFFI: Depth-based Isolation Forest
Feature Importance [4.769747792846005]
Anomaly Detection is an unsupervised learning task aimed at detecting anomalous behaviours with respect to historical data.
The Isolation Forest is one of the most commonly adopted algorithms in the field of Anomaly Detection.
This paper proposes methods to define feature importance scores at both global and local level for the Isolation Forest.
arXiv Detail & Related papers (2020-07-21T22:19:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.