Related papers: OptIForest: Optimal Isolation Forest for Anomaly Detection

OptIForest: Optimal Isolation Forest for Anomaly Detection

URL: http://arxiv.org/abs/2306.12703v2
Date: Fri, 23 Jun 2023 04:30:52 GMT
Title: OptIForest: Optimal Isolation Forest for Anomaly Detection
Authors: Haolong Xiang, Xuyun Zhang, Hongsheng Hu, Lianyong Qi, Wanchun Dou, Mark Dras, Amin Beheshti and Xiaolong Xu
Abstract summary: A category based on the isolation forest mechanism stands out due to its simplicity, effectiveness, and efficiency. In this paper, we establish a theory on isolation efficiency to answer the question and determine the optimal branching factor for an isolation tree. Based on the theoretical underpinning, we design a practical optimal isolation forest OptIForest incorporating clustering based learning to hash.
Score: 19.38817835115542
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Anomaly detection plays an increasingly important role in various fields for critical tasks such as intrusion detection in cybersecurity, financial risk detection, and human health monitoring. A variety of anomaly detection methods have been proposed, and a category based on the isolation forest mechanism stands out due to its simplicity, effectiveness, and efficiency, e.g., iForest is often employed as a state-of-the-art detector for real deployment. While the majority of isolation forests use the binary structure, a framework LSHiForest has demonstrated that the multi-fork isolation tree structure can lead to better detection performance. However, there is no theoretical work answering the fundamentally and practically important question on the optimal tree structure for an isolation forest with respect to the branching factor. In this paper, we establish a theory on isolation efficiency to answer the question and determine the optimal branching factor for an isolation tree. Based on the theoretical underpinning, we design a practical optimal isolation forest OptIForest incorporating clustering based learning to hash which enables more information to be learned from data for better isolation quality. The rationale of our approach relies on a better bias-variance trade-off achieved by bias reduction in OptIForest. Extensive experiments on a series of benchmarking datasets for comparative and ablation studies demonstrate that our approach can efficiently and robustly achieve better detection performance in general than the state-of-the-arts including the deep learning based methods.

Related papers

Less is More: Efficient Black-box Attribution via Minimal Interpretable Subset Selection [52.716143424856185]
We propose LiMA (Less input is More faithful for Attribution), which reformulates the attribution of important regions as an optimization problem for submodular subset selection. LiMA identifies both the most and least important samples while ensuring an optimal attribution boundary that minimizes errors. Our method also outperforms the greedy search in attribution efficiency, being 1.6 times faster.
arXiv Detail & Related papers (2025-04-01T06:58:15Z)
Decision Tree Induction Through LLMs via Semantically-Aware Evolution [53.0367886783772]
We propose an evolutionary optimization method for decision tree induction based on genetic programming (GP) Our key innovation is the integration of semantic priors and domain-specific knowledge about the search space into the algorithm. This is operationalized through novel genetic operators that work with structured natural language prompts.
arXiv Detail & Related papers (2025-03-18T12:52:03Z)
The Best of Both Worlds: On the Dilemma of Out-of-distribution Detection [75.65876949930258]
Out-of-distribution (OOD) detection is essential for model trustworthiness. We show that the superior OOD detection performance of state-of-the-art methods is achieved by secretly sacrificing the OOD generalization ability.
arXiv Detail & Related papers (2024-10-12T07:02:04Z)
Learning Deep Tree-based Retriever for Efficient Recommendation: Theory and Method [76.31185707649227]
We propose a Deep Tree-based Retriever (DTR) for efficient recommendation. DTR frames the training task as a softmax-based multi-class classification over tree nodes at the same level. To mitigate the suboptimality induced by the labeling of non-leaf nodes, we propose a rectification method for the loss function.
arXiv Detail & Related papers (2024-08-21T05:09:53Z)
A Satellite Band Selection Framework for Amazon Forest Deforestation Detection Task [0.5825410941577593]
Deforestation and degradation impact millions of hectares annually, necessitating government or private initiatives for effective forest monitoring. This study introduces a novel framework that employs the Univariate Marginal Distribution Algorithm (UMDA) to select spectral bands from Landsat-8 satellite. This selection guides a semantic segmentation architecture, DeepLabv3+, enhancing its performance.
arXiv Detail & Related papers (2024-04-03T11:47:20Z)
Multi-modal Causal Structure Learning and Root Cause Analysis [67.67578590390907]
We propose Mulan, a unified multi-modal causal structure learning method for root cause localization. We leverage a log-tailored language model to facilitate log representation learning, converting log sequences into time-series data. We also introduce a novel key performance indicator-aware attention mechanism for assessing modality reliability and co-learning a final causal graph.
arXiv Detail & Related papers (2024-02-04T05:50:38Z)
Revisiting Out-of-distribution Robustness in NLP: Benchmark, Analysis, and LLMs Evaluations [111.88727295707454]
This paper reexamines the research on out-of-distribution (OOD) robustness in the field of NLP. We propose a benchmark construction protocol that ensures clear differentiation and challenging distribution shifts. We conduct experiments on pre-trained language models for analysis and evaluation of OOD robustness.
arXiv Detail & Related papers (2023-06-07T17:47:03Z)
AUTO: Adaptive Outlier Optimization for Online Test-Time OOD Detection [81.49353397201887]
Out-of-distribution (OOD) detection is crucial to deploying machine learning models in open-world applications. We introduce a novel paradigm called test-time OOD detection, which utilizes unlabeled online data directly at test time to improve OOD detection performance. We propose adaptive outlier optimization (AUTO), which consists of an in-out-aware filter, an ID memory bank, and a semantically-consistent objective.
arXiv Detail & Related papers (2023-03-22T02:28:54Z)
OpenOOD: Benchmarking Generalized Out-of-Distribution Detection [60.13300701826931]
Out-of-distribution (OOD) detection is vital to safety-critical machine learning applications. The field currently lacks a unified, strictly formulated, and comprehensive benchmark. We build a unified, well-structured called OpenOOD, which implements over 30 methods developed in relevant fields.
arXiv Detail & Related papers (2022-10-13T17:59:57Z)
What Makes Forest-Based Heterogeneous Treatment Effect Estimators Work? [1.1050303097572156]
We show that both methods can be understood in terms of the same parameters and confounding assumptions under L2 loss. In the randomized setting, both approaches performed akin to the new blended versions in a benchmark study.
arXiv Detail & Related papers (2022-06-21T12:45:07Z)
Comparative Study Between Distance Measures On Supervised Optimum-Path Forest Classification [0.0]
Optimum-Path Forest (OPF) uses a graph-based methodology and a distance measure to create arcs between nodes and hence sets of trees. This work proposes a comparative study over a wide range of distance measures applied to the supervised Optimum-Path Forest classification.
arXiv Detail & Related papers (2022-02-08T13:34:09Z)
MOOD: Multi-level Out-of-distribution Detection [13.207044902083057]
Out-of-distribution (OOD) detection is essential to prevent anomalous inputs from causing a model to fail during deployment. We propose a novel framework, multi-level out-of-distribution detection MOOD, which exploits intermediate classifier outputs for dynamic and efficient OOD inference. MOOD achieves up to 71.05% computational reduction in inference, while maintaining competitive OOD detection performance.
arXiv Detail & Related papers (2021-04-30T02:18:31Z)
Interpretable Anomaly Detection with DIFFI: Depth-based Isolation Forest Feature Importance [4.769747792846005]
Anomaly Detection is an unsupervised learning task aimed at detecting anomalous behaviours with respect to historical data. The Isolation Forest is one of the most commonly adopted algorithms in the field of Anomaly Detection. This paper proposes methods to define feature importance scores at both global and local level for the Isolation Forest.
arXiv Detail & Related papers (2020-07-21T22:19:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.