Isolation forests: looking beyond tree depth
- URL: http://arxiv.org/abs/2111.11639v1
- Date: Tue, 23 Nov 2021 04:04:31 GMT
- Title: Isolation forests: looking beyond tree depth
- Authors: David Cortes
- Abstract summary: It will take fewer random cuts for an outlier to be left alone in a given subspace as compared to regular observations.
The original idea proposed an outlier score based on the tree depth (number of random cuts) required for isolation.
Experiments here show that using information about the size of the feature space taken and the number of points assigned to it can result in improved results in many situations.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The isolation forest algorithm for outlier detection exploits a simple yet
effective observation: if taking some multivariate data and making uniformly
random cuts across the feature space recursively, it will take fewer such
random cuts for an outlier to be left alone in a given subspace as compared to
regular observations. The original idea proposed an outlier score based on the
tree depth (number of random cuts) required for isolation, but experiments here
show that using information about the size of the feature space taken and the
number of points assigned to it can result in improved results in many
situations without any modification to the tree structure, especially in the
presence of categorical features.
Related papers
- Diversity Conscious Refined Random Forest [0.0]
Random Forest (RF) is a widely used ensemble learning technique.<n>RF often relies on hundreds of trees and all input features, leading to high cost and model redundancy.<n>We propose a Refined Random Forest that grows trees only on informative features and enforces maximal diversity by clustering and retaining uncorrelated trees.
arXiv Detail & Related papers (2025-07-01T06:28:15Z) - Exploring space efficiency in a tree-based linear model for extreme multi-label classification [11.18858602369985]
Extreme multi-label classification (XMC) aims to identify relevant subsets from numerous labels.
Among the various approaches for XMC, tree-based linear models are effective due to their superior efficiency and simplicity.
In this work, we conduct both theoretical and empirical analyses on the space to store a tree model under the assumption of sparse data.
arXiv Detail & Related papers (2024-10-12T15:02:40Z) - Why do Random Forests Work? Understanding Tree Ensembles as
Self-Regularizing Adaptive Smoothers [68.76846801719095]
We argue that the current high-level dichotomy into bias- and variance-reduction prevalent in statistics is insufficient to understand tree ensembles.
We show that forests can improve upon trees by three distinct mechanisms that are usually implicitly entangled.
arXiv Detail & Related papers (2024-02-02T15:36:43Z) - Distribution and volume based scoring for Isolation Forests [0.0]
We make two contributions to the Isolation Forest method for anomaly and outlier detection.
The first is an information-theoretically motivated generalisation of the score function that is used to aggregate the scores across random tree estimators.
The second is an alternative scoring function at the level of the individual tree estimator, in which we replace the depth-based scoring of the Isolation Forest with one based on hyper-volumes associated to an isolation tree's leaf nodes.
arXiv Detail & Related papers (2023-09-20T16:27:10Z) - Deep Isolation Forest for Anomaly Detection [16.581154394513025]
Isolation forest (iForest) has been emerging as arguably the most popular anomaly detector in recent years.
Our model achieves significant improvement over state-of-the-art isolation-based methods and deep detectors on datasets.
arXiv Detail & Related papers (2022-06-14T05:47:07Z) - Revisiting randomized choices in isolation forests [0.0]
Isolation forest or "iForest" is an intuitive and widely used algorithm for anomaly detection.
This paper shows that "clustered" diverse outliers can be more easily identified by applying a non-uniformly-random choice of variables and/or thresholds.
arXiv Detail & Related papers (2021-10-26T04:08:49Z) - Robustifying Algorithms of Learning Latent Trees with Vector Variables [92.18777020401484]
We present the sample complexities of Recursive Grouping (RG) and Chow-Liu Recursive Grouping (CLRG)
We robustify RG, CLRG, Neighbor Joining (NJ) and Spectral NJ (SNJ) by using the truncated inner product.
We derive the first known instance-dependent impossibility result for structure learning of latent trees.
arXiv Detail & Related papers (2021-06-02T01:37:52Z) - Intersection Regularization for Extracting Semantic Attributes [72.53481390411173]
We consider the problem of supervised classification, such that the features that the network extracts match an unseen set of semantic attributes.
For example, when learning to classify images of birds into species, we would like to observe the emergence of features that zoologists use to classify birds.
We propose training a neural network with discrete top-level activations, which is followed by a multi-layered perceptron (MLP) and a parallel decision tree.
arXiv Detail & Related papers (2021-03-22T14:32:44Z) - Spectral Top-Down Recovery of Latent Tree Models [13.681975313065477]
Spectral Top-Down Recovery (STDR) is a divide-and-conquer approach for inference of large latent tree models.
STDR's partitioning step is non-random. Instead, it is based on the Fiedler vector of a suitable Laplacian matrix related to the observed nodes.
We prove that STDR is statistically consistent, and bound the number of samples required to accurately recover the tree with high probability.
arXiv Detail & Related papers (2021-02-26T02:47:42Z) - Rethinking Learnable Tree Filter for Generic Feature Transform [71.77463476808585]
Learnable Tree Filter presents a remarkable approach to model structure-preserving relations for semantic segmentation.
To relax the geometric constraint, we give the analysis by reformulating it as a Markov Random Field and introduce a learnable unary term.
For semantic segmentation, we achieve leading performance (82.1% mIoU) on the Cityscapes benchmark without bells-and-whistles.
arXiv Detail & Related papers (2020-12-07T07:16:47Z) - Forest R-CNN: Large-Vocabulary Long-Tailed Object Detection and Instance
Segmentation [75.93960390191262]
We exploit prior knowledge of the relations among object categories to cluster fine-grained classes into coarser parent classes.
We propose a simple yet effective resampling method, NMS Resampling, to re-balance the data distribution.
Our method, termed as Forest R-CNN, can serve as a plug-and-play module being applied to most object recognition models.
arXiv Detail & Related papers (2020-08-13T03:52:37Z) - Spatially Adaptive Inference with Stochastic Feature Sampling and
Interpolation [72.40827239394565]
We propose to compute features only at sparsely sampled locations.
We then densely reconstruct the feature map with an efficient procedure.
The presented network is experimentally shown to save substantial computation while maintaining accuracy over a variety of computer vision tasks.
arXiv Detail & Related papers (2020-03-19T15:36:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.