Tree-based Ensemble Learning for Out-of-distribution Detection
- URL: http://arxiv.org/abs/2405.03060v1
- Date: Sun, 5 May 2024 21:49:51 GMT
- Title: Tree-based Ensemble Learning for Out-of-distribution Detection
- Authors: Zhaiming Shen, Menglun Wang, Guang Cheng, Ming-Jun Lai, Lin Mu, Ruihao Huang, Qi Liu, Hao Zhu,
- Abstract summary: TOOD detection is a simple yet effective tree-based out-of-distribution detection mechanism.
Our approach is interpretable and robust for its tree-based nature.
- Score: 14.464948762955713
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Being able to successfully determine whether the testing samples has similar distribution as the training samples is a fundamental question to address before we can safely deploy most of the machine learning models into practice. In this paper, we propose TOOD detection, a simple yet effective tree-based out-of-distribution (TOOD) detection mechanism to determine if a set of unseen samples will have similar distribution as of the training samples. The TOOD detection mechanism is based on computing pairwise hamming distance of testing samples' tree embeddings, which are obtained by fitting a tree-based ensemble model through in-distribution training samples. Our approach is interpretable and robust for its tree-based nature. Furthermore, our approach is efficient, flexible to various machine learning tasks, and can be easily generalized to unsupervised setting. Extensive experiments are conducted to show the proposed method outperforms other state-of-the-art out-of-distribution detection methods in distinguishing the in-distribution from out-of-distribution on various tabular, image, and text data.
Related papers
- Towards More Trustworthy Deep Code Models by Enabling Out-of-Distribution Detection [12.141246816152288]
We develop two types of SE-specific OOD detection models, unsupervised and weakly-supervised OOD detection for code.
Our proposed methods significantly outperform the baselines in detecting OOD samples from four different scenarios simultaneously and also positively impact a main code understanding task.
arXiv Detail & Related papers (2025-02-26T06:59:53Z) - Advancing Out-of-Distribution Detection via Local Neuroplasticity [60.53625435889467]
This paper presents a novel OOD detection method that leverages the unique local neuroplasticity property of Kolmogorov-Arnold Networks (KANs)
Our method compares the activation patterns of a trained KAN against its untrained counterpart to detect OOD samples.
We validate our approach on benchmarks from image and medical domains, demonstrating superior performance and robustness compared to state-of-the-art techniques.
arXiv Detail & Related papers (2025-02-20T11:13:41Z) - Out-of-Distribution Detection with a Single Unconditional Diffusion Model [54.15132801131365]
Out-of-distribution (OOD) detection is a critical task in machine learning that seeks to identify abnormal samples.
Traditionally, unsupervised methods utilize a deep generative model for OOD detection.
This paper explores whether a single model can perform OOD detection across diverse tasks.
arXiv Detail & Related papers (2024-05-20T08:54:03Z) - Toward a Realistic Benchmark for Out-of-Distribution Detection [3.8038269045375515]
We introduce a comprehensive benchmark for OOD detection based on ImageNet and Places365.
Several techniques can be used to determine which classes should be considered in-distribution, yielding benchmarks with varying properties.
arXiv Detail & Related papers (2024-04-16T11:29:43Z) - Detecting Out-of-Distribution Samples via Conditional Distribution
Entropy with Optimal Transport [20.421338676377587]
We argue that empirical probability distributions that incorporate geometric information from both training samples and test inputs can be highly beneficial for OOD detection.
Within the framework of optimal transport, we propose a novel score function known as the emphconditional distribution entropy to quantify the uncertainty of a test input being an OOD sample.
arXiv Detail & Related papers (2024-01-22T07:07:32Z) - Distribution Shift Inversion for Out-of-Distribution Prediction [57.22301285120695]
We propose a portable Distribution Shift Inversion algorithm for Out-of-Distribution (OoD) prediction.
We show that our method provides a general performance gain when plugged into a wide range of commonly used OoD algorithms.
arXiv Detail & Related papers (2023-06-14T08:00:49Z) - Breaking Down Out-of-Distribution Detection: Many Methods Based on OOD
Training Data Estimate a Combination of the Same Core Quantities [104.02531442035483]
The goal of this paper is to recognize common objectives as well as to identify the implicit scoring functions of different OOD detection methods.
We show that binary discrimination between in- and (different) out-distributions is equivalent to several distinct formulations of the OOD detection problem.
We also show that the confidence loss which is used by Outlier Exposure has an implicit scoring function which differs in a non-trivial fashion from the theoretically optimal scoring function.
arXiv Detail & Related papers (2022-06-20T16:32:49Z) - Multiple Testing Framework for Out-of-Distribution Detection [27.248375922343616]
We study the problem of Out-of-Distribution (OOD) detection, that is, detecting whether a learning algorithm's output can be trusted at inference time.
We propose a definition for the notion of OOD that includes both the input distribution and the learning algorithm, which provides insights for the construction of powerful tests for OOD detection.
arXiv Detail & Related papers (2022-06-20T00:56:01Z) - WOOD: Wasserstein-based Out-of-Distribution Detection [6.163329453024915]
Training data for deep-neural-network-based classifiers are usually assumed to be sampled from the same distribution.
When part of the test samples are drawn from a distribution that is far away from that of the training samples, the trained neural network has a tendency to make high confidence predictions for these OOD samples.
We propose a Wasserstein-based out-of-distribution detection (WOOD) method to overcome these challenges.
arXiv Detail & Related papers (2021-12-13T02:35:15Z) - Sampling from Arbitrary Functions via PSD Models [55.41644538483948]
We take a two-step approach by first modeling the probability distribution and then sampling from that model.
We show that these models can approximate a large class of densities concisely using few evaluations, and present a simple algorithm to effectively sample from these models.
arXiv Detail & Related papers (2021-10-20T12:25:22Z) - An Empirical Comparison of Instance Attribution Methods for NLP [62.63504976810927]
We evaluate the degree to which different potential instance attribution agree with respect to the importance of training samples.
We find that simple retrieval methods yield training instances that differ from those identified via gradient-based methods.
arXiv Detail & Related papers (2021-04-09T01:03:17Z) - Learn what you can't learn: Regularized Ensembles for Transductive
Out-of-distribution Detection [76.39067237772286]
We show that current out-of-distribution (OOD) detection algorithms for neural networks produce unsatisfactory results in a variety of OOD detection scenarios.
This paper studies how such "hard" OOD scenarios can benefit from adjusting the detection method after observing a batch of the test data.
We propose a novel method that uses an artificial labeling scheme for the test data and regularization to obtain ensembles of models that produce contradictory predictions only on the OOD samples in a test batch.
arXiv Detail & Related papers (2020-12-10T16:55:13Z) - Out-of-distribution detection for regression tasks: parameter versus
predictor entropy [2.026281591452464]
It is crucial to detect when an instance lies downright too far from the training samples for the machine learning model to be trusted.
For neural networks, one approach to this task consists of learning a diversity of predictors that all can explain the training data.
We propose a new way of estimating the entropy of a distribution on predictors based on nearest neighbors in function space.
arXiv Detail & Related papers (2020-10-24T21:41:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.