Fast kernel half-space depth for data with non-convex supports
- URL: http://arxiv.org/abs/2312.14136v1
- Date: Thu, 21 Dec 2023 18:55:22 GMT
- Title: Fast kernel half-space depth for data with non-convex supports
- Authors: Arturo Castellanos, Pavlo Mozharovskyi, Florence d'Alch\'e-Buc, Hicham
Janati
- Abstract summary: We extend the celebrated halfspace depth to tackle distribution's multimodality.
The proposed depth can be computed using manifold gradient making faster than halfspace depth by several orders of magnitude.
The performance of our depth is demonstrated through numerical simulations as well as applications such as anomaly detection on real data and homogeneity testing.
- Score: 5.725360029813277
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data depth is a statistical function that generalizes order and quantiles to
the multivariate setting and beyond, with applications spanning over
descriptive and visual statistics, anomaly detection, testing, etc. The
celebrated halfspace depth exploits data geometry via an optimization program
to deliver properties of invariances, robustness, and non-parametricity.
Nevertheless, it implicitly assumes convex data supports and requires
exponential computational cost. To tackle distribution's multimodality, we
extend the halfspace depth in a Reproducing Kernel Hilbert Space (RKHS). We
show that the obtained depth is intuitive and establish its consistency with
provable concentration bounds that allow for homogeneity testing. The proposed
depth can be computed using manifold gradient making faster than halfspace
depth by several orders of magnitude. The performance of our depth is
demonstrated through numerical simulations as well as applications such as
anomaly detection on real data and homogeneity testing.
Related papers
- Gradient-Based Feature Learning under Structured Data [57.76552698981579]
In the anisotropic setting, the commonly used spherical gradient dynamics may fail to recover the true direction.
We show that appropriate weight normalization that is reminiscent of batch normalization can alleviate this issue.
In particular, under the spiked model with a suitably large spike, the sample complexity of gradient-based training can be made independent of the information exponent.
arXiv Detail & Related papers (2023-09-07T16:55:50Z) - Score-based Diffusion Models in Function Space [140.792362459734]
Diffusion models have recently emerged as a powerful framework for generative modeling.
We introduce a mathematically rigorous framework called Denoising Diffusion Operators (DDOs) for training diffusion models in function space.
We show that the corresponding discretized algorithm generates accurate samples at a fixed cost independent of the data resolution.
arXiv Detail & Related papers (2023-02-14T23:50:53Z) - Adaptive Data Depth via Multi-Armed Bandits [6.29475963948119]
We develop an instance-adaptive algorithm for data depth computation.
We focus on simplicial depth, developed by Liu (1990), which has emerged as a promising notion of depth.
We show that we can reduce the complexity of identifying the deepest point in the data set from $O(nd)$ to $tildeO(nd-(d-1)alpha/2)$, where $tildeO$ suppresses logarithmic factors.
arXiv Detail & Related papers (2022-11-08T03:44:22Z) - Non-parametric Depth Distribution Modelling based Depth Inference for
Multi-view Stereo [43.415242967722804]
Recent cost volume pyramid based deep neural networks have unlocked the potential of efficiently leveraging high-resolution images for depth inference from multi-view stereo.
In general, those approaches assume that the depth of each pixel follows a unimodal distribution.
We propose constructing the cost volume by non-parametric depth distribution modeling to handle pixels with unimodal and multi-modal distributions.
arXiv Detail & Related papers (2022-05-08T05:13:04Z) - Super-resolution GANs of randomly-seeded fields [68.8204255655161]
We propose a novel super-resolution generative adversarial network (GAN) framework to estimate field quantities from random sparse sensors.
The algorithm exploits random sampling to provide incomplete views of the high-resolution underlying distributions.
The proposed technique is tested on synthetic databases of fluid flow simulations, ocean surface temperature distributions measurements, and particle image velocimetry data.
arXiv Detail & Related papers (2022-02-23T18:57:53Z) - Measuring dissimilarity with diffeomorphism invariance [94.02751799024684]
We introduce DID, a pairwise dissimilarity measure applicable to a wide range of data spaces.
We prove that DID enjoys properties which make it relevant for theoretical study and practical use.
arXiv Detail & Related papers (2022-02-11T13:51:30Z) - Eikonal depth: an optimal control approach to statistical depths [0.7614628596146599]
We propose a new type of globally defined statistical depth, based upon control theory and eikonal equations.
This depth is easy to interpret and compute, expressively captures multi-modal behavior, and extends naturally to data that is non-Euclidean.
arXiv Detail & Related papers (2022-01-14T01:57:48Z) - Density-Based Clustering with Kernel Diffusion [59.4179549482505]
A naive density corresponding to the indicator function of a unit $d$-dimensional Euclidean ball is commonly used in density-based clustering algorithms.
We propose a new kernel diffusion density function, which is adaptive to data of varying local distributional characteristics and smoothness.
arXiv Detail & Related papers (2021-10-11T09:00:33Z) - Featurized Density Ratio Estimation [82.40706152910292]
In our work, we propose to leverage an invertible generative model to map the two distributions into a common feature space prior to estimation.
This featurization brings the densities closer together in latent space, sidestepping pathological scenarios where the learned density ratios in input space can be arbitrarily inaccurate.
At the same time, the invertibility of our feature map guarantees that the ratios computed in feature space are equivalent to those in input space.
arXiv Detail & Related papers (2021-07-05T18:30:26Z) - Depth-based pseudo-metrics between probability distributions [1.1470070927586016]
We propose two new pseudo-metrics between continuous probability measures based on data depth and its associated central regions.
In contrast to the Wasserstein distance, the proposed pseudo-metrics do not suffer from the curse of dimensionality.
The regions-based pseudo-metric appears to be robust w.r.t. both outliers and heavy tails.
arXiv Detail & Related papers (2021-03-23T17:33:18Z) - Balanced Depth Completion between Dense Depth Inference and Sparse Range
Measurements via KISS-GP [14.158132769768578]
Estimating a dense and accurate depth map is the key requirement for autonomous driving and robotics.
Recent advances in deep learning have allowed depth estimation in full resolution from a single image.
Despite this impressive result, many deep-learning-based monocular depth estimation algorithms have failed to keep their accuracy yielding a meter-level estimation error.
arXiv Detail & Related papers (2020-08-12T08:07:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.