Theoretical Investigation on Inductive Bias of Isolation Forest
- URL: http://arxiv.org/abs/2505.12825v1
- Date: Mon, 19 May 2025 08:07:43 GMT
- Title: Theoretical Investigation on Inductive Bias of Isolation Forest
- Authors: Qin-Cheng Zheng, Shao-Qun Zhang, Shen-Huan Lyu, Yuan Jiang, Zhi-Hua Zhou,
- Abstract summary: Isolation Forest (iForest) stands out as a widely-used unsupervised anomaly detector valued for its exceptional runtime efficiency and performance on large-scale tasks.<n>This paper theoretically investigates the conditions and extent of iForest's effectiveness by analyzing its inductive bias through the formulation of depth functions and growth processes.
- Score: 50.73712396699867
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Isolation Forest (iForest) stands out as a widely-used unsupervised anomaly detector valued for its exceptional runtime efficiency and performance on large-scale tasks. Despite its widespread adoption, a theoretical foundation explaining iForest's success remains unclear. This paper theoretically investigates the conditions and extent of iForest's effectiveness by analyzing its inductive bias through the formulation of depth functions and growth processes. Since directly analyzing the depth function proves intractable due to iForest's random splitting mechanism, we model the growth process of iForest as a random walk, enabling us to derive the expected depth function using transition probabilities. Our case studies reveal key inductive biases: iForest exhibits lower sensitivity to central anomalies while demonstrating greater parameter adaptability compared to $k$-Nearest Neighbor anomaly detectors. Our study provides theoretical understanding of the effectiveness of iForest and establishes a foundation for further theoretical exploration.
Related papers
- Probing Deep into Temporal Profile Makes the Infrared Small Target Detector Much Better [63.567886330598945]
Infrared small target (IRST) detection is challenging in simultaneously achieving precise, universal, robust and efficient performance.<n>Current learning-based methods attempt to leverage more" information from both the spatial and the short-term temporal domains.<n>We propose an efficient deep temporal probe network (DeepPro) that only performs calculations in the time dimension for IRST detection.
arXiv Detail & Related papers (2025-06-15T08:19:32Z) - Long-term Causal Inference via Modeling Sequential Latent Confounding [49.64731441006396]
Long-term causal inference is an important but challenging problem across various scientific domains.<n>We propose an approach based on the Conditional Additive Equi-Confounding Bias (CAECB) assumption.<n>Our proposed assumption states a functional relationship between sequential confounding biases across temporal short-term outcomes.
arXiv Detail & Related papers (2025-02-26T09:56:56Z) - Towards Understanding Extrapolation: a Causal Lens [53.15488984371969]
We provide a theoretical understanding of when extrapolation is possible and offer principled methods to achieve it.<n>Under this formulation, we cast the extrapolation problem into a latent-variable identification problem.<n>Our theory reveals the intricate interplay between the underlying manifold's smoothness and the shift properties.
arXiv Detail & Related papers (2025-01-15T21:29:29Z) - A Central Limit Theorem for the permutation importance measure [0.44998333629984877]
We provide a formal proof of a Central Limit Theorem for RFPIM using U-Statistics theory.<n>Our result aims at improving the theoretical understanding of RFPIM rather than conducting comprehensive hypothesis testing.
arXiv Detail & Related papers (2024-12-17T15:40:21Z) - Bayesian Intervention Optimization for Causal Discovery [23.51328013481865]
Causal discovery is crucial for understanding complex systems and informing decisions.
Current methods, such as Bayesian and graph-theoretical approaches, do not prioritize decision-making.
We propose a novel Bayesian optimization-based method inspired by Bayes factors.
arXiv Detail & Related papers (2024-06-16T12:45:44Z) - Ladder-of-Thought: Using Knowledge as Steps to Elevate Stance Detection [73.31406286956535]
We introduce the Ladder-of-Thought (LoT) for the stance detection task.
LoT directs the small LMs to assimilate high-quality external knowledge, refining the intermediate rationales produced.
Our empirical evaluations underscore LoT's efficacy, marking a 16% improvement over GPT-3.5 and a 10% enhancement compared to GPT-3.5 with CoT on stance detection task.
arXiv Detail & Related papers (2023-08-31T14:31:48Z) - OptIForest: Optimal Isolation Forest for Anomaly Detection [19.38817835115542]
A category based on the isolation forest mechanism stands out due to its simplicity, effectiveness, and efficiency.
In this paper, we establish a theory on isolation efficiency to answer the question and determine the optimal branching factor for an isolation tree.
Based on the theoretical underpinning, we design a practical optimal isolation forest OptIForest incorporating clustering based learning to hash.
arXiv Detail & Related papers (2023-06-22T07:14:02Z) - Counterfactual Reasoning for Out-of-distribution Multimodal Sentiment
Analysis [56.84237932819403]
This paper aims to estimate and mitigate the bad effect of textual modality for strong OOD generalization.
Inspired by this, we devise a model-agnostic counterfactual framework for multimodal sentiment analysis.
arXiv Detail & Related papers (2022-07-24T03:57:40Z) - FACT: High-Dimensional Random Forests Inference [4.941630596191806]
Quantifying the usefulness of individual features in random forests learning can greatly enhance its interpretability.
Existing studies have shown that some popularly used feature importance measures for random forests suffer from the bias issue.
We propose a framework of the self-normalized feature-residual correlation test (FACT) for evaluating the significance of a given feature.
arXiv Detail & Related papers (2022-07-04T19:05:08Z) - Counterfactual Maximum Likelihood Estimation for Training Deep Networks [83.44219640437657]
Deep learning models are prone to learning spurious correlations that should not be learned as predictive clues.
We propose a causality-based training framework to reduce the spurious correlations caused by observable confounders.
We conduct experiments on two real-world tasks: Natural Language Inference (NLI) and Image Captioning.
arXiv Detail & Related papers (2021-06-07T17:47:16Z) - Interpretable Anomaly Detection with DIFFI: Depth-based Isolation Forest
Feature Importance [4.769747792846005]
Anomaly Detection is an unsupervised learning task aimed at detecting anomalous behaviours with respect to historical data.
The Isolation Forest is one of the most commonly adopted algorithms in the field of Anomaly Detection.
This paper proposes methods to define feature importance scores at both global and local level for the Isolation Forest.
arXiv Detail & Related papers (2020-07-21T22:19:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.