Enhancing Interpretability and Generalizability in Extended Isolation Forests
- URL: http://arxiv.org/abs/2310.05468v3
- Date: Wed, 09 Oct 2024 09:56:32 GMT
- Title: Enhancing Interpretability and Generalizability in Extended Isolation Forests
- Authors: Alessio Arcudi, Davide Frizzo, Chiara Masiero, Gian Antonio Susto,
- Abstract summary: Extended Isolation Forest Feature Importance (ExIFFI) is a method that explains predictions made by Extended Isolation Forest (EIF) models.
EIF+ is designed to enhance the model's ability to detect unseen anomalies through a revised splitting strategy.
ExIFFI outperforms other unsupervised interpretability methods on 8 of 11 real-world datasets.
- Score: 5.139809663513828
- License:
- Abstract: Anomaly Detection (AD) focuses on identifying unusual behaviors in complex datasets. Machine Learning (ML) algorithms and Decision Support Systems (DSSs) provide effective solutions for AD, but detecting anomalies alone may not be enough, especially in engineering, where diagnostics and maintenance are crucial. Users need clear explanations to support root cause analysis and build trust in the model. The unsupervised nature of AD, however, makes interpretability a challenge. This paper introduces Extended Isolation Forest Feature Importance (ExIFFI), a method that explains predictions made by Extended Isolation Forest (EIF) models, which split data using hyperplanes. ExIFFI provides explanations at both global and local levels by leveraging feature importance. We also present an improved version, Enhanced Extended Isolation Forest (EIF+), designed to enhance the model's ability to detect unseen anomalies through a revised splitting strategy. Using five synthetic and eleven real-world datasets, we conduct a comparative analysis, evaluating unsupervised AD methods with the Average Precision metric. EIF+ consistently outperforms EIF across all datasets when trained without anomalies, demonstrating better generalization. To assess ExIFFI's interpretability, we introduce the Area Under the Curve of Feature Selection (AUC\_FS), a novel metric using feature selection as a proxy task. ExIFFI outperforms other unsupervised interpretability methods on 8 of 11 real-world datasets and successfully identifies anomalous features in synthetic datasets. When trained only on inliers, ExIFFI also outperforms competing models on real-world data and accurately detects anomalous features in synthetic datasets. We provide open-source code to encourage further research and reproducibility.
Related papers
- Directly Handling Missing Data in Linear Discriminant Analysis for Enhancing Classification Accuracy and Interpretability [1.4840867281815378]
We introduce a novel and robust classification method, termed weighted missing Linear Discriminant Analysis (WLDA)
WLDA extends Linear Discriminant Analysis (LDA) to handle datasets with missing values without the need for imputation.
We conduct an in-depth theoretical analysis to establish the properties of WLDA and thoroughly evaluate its explainability.
arXiv Detail & Related papers (2024-06-30T14:21:32Z) - AcME-AD: Accelerated Model Explanations for Anomaly Detection [5.702288833888639]
AcME-AD is a model-agnostic, efficient solution for interoperability.
It offers local feature importance scores and a what-if analysis tool, shedding light on the factors contributing to each anomaly.
This paper elucidates AcME-AD's foundation, its benefits over existing methods, and validates its effectiveness with tests on both synthetic and real datasets.
arXiv Detail & Related papers (2024-03-02T16:11:58Z) - Filling the Missing: Exploring Generative AI for Enhanced Federated
Learning over Heterogeneous Mobile Edge Devices [72.61177465035031]
We propose a generative AI-empowered federated learning to address these challenges by leveraging the idea of FIlling the MIssing (FIMI) portion of local data.
Experiment results demonstrate that FIMI can save up to 50% of the device-side energy to achieve the target global test accuracy.
arXiv Detail & Related papers (2023-10-21T12:07:04Z) - Physics Inspired Hybrid Attention for SAR Target Recognition [61.01086031364307]
We propose a physics inspired hybrid attention (PIHA) mechanism and the once-for-all (OFA) evaluation protocol to address the issues.
PIHA leverages the high-level semantics of physical information to activate and guide the feature group aware of local semantics of target.
Our method outperforms other state-of-the-art approaches in 12 test scenarios with same ASC parameters.
arXiv Detail & Related papers (2023-09-27T14:39:41Z) - Consistency Regularization for Generalizable Source-free Domain
Adaptation [62.654883736925456]
Source-free domain adaptation (SFDA) aims to adapt a well-trained source model to an unlabelled target domain without accessing the source dataset.
Existing SFDA methods ONLY assess their adapted models on the target training set, neglecting the data from unseen but identically distributed testing sets.
We propose a consistency regularization framework to develop a more generalizable SFDA method.
arXiv Detail & Related papers (2023-08-03T07:45:53Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - Active Learning-based Isolation Forest (ALIF): Enhancing Anomaly
Detection in Decision Support Systems [2.922007656878633]
ALIF is a lightweight modification of the popular Isolation Forest that proved superior performances with respect to other state-of-art algorithms.
The proposed approach is particularly appealing in the presence of a Decision Support System (DSS), a case that is increasingly popular in real-world scenarios.
arXiv Detail & Related papers (2022-07-08T14:36:38Z) - DRFLM: Distributionally Robust Federated Learning with Inter-client
Noise via Local Mixup [58.894901088797376]
federated learning has emerged as a promising approach for training a global model using data from multiple organizations without leaking their raw data.
We propose a general framework to solve the above two challenges simultaneously.
We provide comprehensive theoretical analysis including robustness analysis, convergence analysis, and generalization ability.
arXiv Detail & Related papers (2022-04-16T08:08:29Z) - Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management.
We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.