Dimensionality-Aware Outlier Detection: Theoretical and Experimental Analysis
- URL: http://arxiv.org/abs/2401.05453v2
- Date: Sun, 21 Apr 2024 03:57:31 GMT
- Title: Dimensionality-Aware Outlier Detection: Theoretical and Experimental Analysis
- Authors: Alastair Anderberg, James Bailey, Ricardo J. G. B. Campello, Michael E. Houle, Henrique O. Marques, Miloš Radovanović, Arthur Zimek,
- Abstract summary: We present a nonparametric method for outlier detection that takes full account of local variations in dimensionality within the dataset.
We show that significantly outperforms three popular and important benchmark outlier detection methods.
- Score: 9.962838991341874
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a nonparametric method for outlier detection that takes full account of local variations in intrinsic dimensionality within the dataset. Using the theory of Local Intrinsic Dimensionality (LID), our 'dimensionality-aware' outlier detection method, DAO, is derived as an estimator of an asymptotic local expected density ratio involving the query point and a close neighbor drawn at random. The dimensionality-aware behavior of DAO is due to its use of local estimation of LID values in a theoretically-justified way. Through comprehensive experimentation on more than 800 synthetic and real datasets, we show that DAO significantly outperforms three popular and important benchmark outlier detection methods: Local Outlier Factor (LOF), Simplified LOF, and kNN.
Related papers
- Embedding Trajectory for Out-of-Distribution Detection in Mathematical Reasoning [50.84938730450622]
We propose a trajectory-based method TV score, which uses trajectory volatility for OOD detection in mathematical reasoning.
Our method outperforms all traditional algorithms on GLMs under mathematical reasoning scenarios.
Our method can be extended to more applications with high-density features in output spaces, such as multiple-choice questions.
arXiv Detail & Related papers (2024-05-22T22:22:25Z) - Collaborative Heterogeneous Causal Inference Beyond Meta-analysis [68.4474531911361]
We propose a collaborative inverse propensity score estimator for causal inference with heterogeneous data.
Our method shows significant improvements over the methods based on meta-analysis when heterogeneity increases.
arXiv Detail & Related papers (2024-04-24T09:04:36Z) - Improving Efficiency of Iso-Surface Extraction on Implicit Neural
Representations Using Uncertainty Propagation [32.329370320329005]
Implicit Neural representations (INRs) are widely used for scientific data reduction and visualization.
Range analysis has shown promising results in improving the efficiency of geometric queries on INRs for 3D geometries.
We present an improved technique for range analysis by revisiting the arithmetic rules and analyzing the probability distribution of the network output within a spatial region.
arXiv Detail & Related papers (2024-02-21T15:10:20Z) - Score Approximation, Estimation and Distribution Recovery of Diffusion
Models on Low-Dimensional Data [68.62134204367668]
This paper studies score approximation, estimation, and distribution recovery of diffusion models, when data are supported on an unknown low-dimensional linear subspace.
We show that with a properly chosen neural network architecture, the score function can be both accurately approximated and efficiently estimated.
The generated distribution based on the estimated score function captures the data geometric structures and converges to a close vicinity of the data distribution.
arXiv Detail & Related papers (2023-02-14T17:02:35Z) - Preserving local densities in low-dimensional embeddings [37.278617643507815]
State-of-the-art methods, such as tSNE and UMAP, excel in unveiling local structures hidden in high-dimensional data.
We show, however, that these methods fail to reconstruct local properties, such as relative differences in densities.
We suggest dtSNE, which approximately conserves local densities.
arXiv Detail & Related papers (2023-01-31T16:11:54Z) - Intrinsic Dimensionality Estimation within Tight Localities: A
Theoretical and Experimental Analysis [0.0]
We propose a local ID estimation strategy stable even for tight' localities consisting of as few as 20 sample points.
Our experimental results show that our proposed estimation technique can achieve notably smaller variance, while maintaining comparable levels of bias, at much smaller sample sizes than state-of-the-art estimators.
arXiv Detail & Related papers (2022-09-29T00:00:11Z) - LIDL: Local Intrinsic Dimension Estimation Using Approximate Likelihood [10.35315334180936]
We propose a novel approach to the problem: Local Intrinsic Dimension estimation using approximate Likelihood (LIDL)
Our method relies on an arbitrary density estimation method as its subroutine and hence tries to sidestep the dimensionality challenge.
We show that LIDL yields competitive results on the standard benchmarks for this problem and that it scales to thousands of dimensions.
arXiv Detail & Related papers (2022-06-29T19:47:46Z) - Meta-Learning for Relative Density-Ratio Estimation [59.75321498170363]
Existing methods for (relative) density-ratio estimation (DRE) require many instances from both densities.
We propose a meta-learning method for relative DRE, which estimates the relative density-ratio from a few instances by using knowledge in related datasets.
We empirically demonstrate the effectiveness of the proposed method by using three problems: relative DRE, dataset comparison, and outlier detection.
arXiv Detail & Related papers (2021-07-02T02:13:45Z) - Distributionally Robust Local Non-parametric Conditional Estimation [22.423052432220235]
We propose a new distributionally robust estimator that generates non-parametric local estimates.
We show that despite being generally intractable, the local estimator can be efficiently found via convex optimization.
Experiments with synthetic and MNIST datasets show the competitive performance of this new class of estimators.
arXiv Detail & Related papers (2020-10-12T00:11:17Z) - Nonparametric Density Estimation from Markov Chains [68.8204255655161]
We introduce a new nonparametric density estimator inspired by Markov Chains, and generalizing the well-known Kernel Density Estor.
Our estimator presents several benefits with respect to the usual ones and can be used straightforwardly as a foundation in all density-based algorithms.
arXiv Detail & Related papers (2020-09-08T18:33:42Z) - Distribution Approximation and Statistical Estimation Guarantees of
Generative Adversarial Networks [82.61546580149427]
Generative Adversarial Networks (GANs) have achieved a great success in unsupervised learning.
This paper provides approximation and statistical guarantees of GANs for the estimation of data distributions with densities in a H"older space.
arXiv Detail & Related papers (2020-02-10T16:47:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.