Robust Topological Inference in the Presence of Outliers
- URL: http://arxiv.org/abs/2206.01795v1
- Date: Fri, 3 Jun 2022 19:45:43 GMT
- Title: Robust Topological Inference in the Presence of Outliers
- Authors: Siddharth Vishwanath, Bharath K. Sriperumbudur, Kenji Fukumizu and
Satoshi Kuriki
- Abstract summary: The distance function to a compact set plays a crucial role in the paradigm of topological data analysis.
Despite its stability to perturbations in the Hausdorff distance, persistent homology is highly sensitive to outliers.
We propose a $textitmedian-of-means$ variant of the distance function ($textsfMoM Dist$), and establish its statistical properties.
- Score: 18.6112824677157
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The distance function to a compact set plays a crucial role in the paradigm
of topological data analysis. In particular, the sublevel sets of the distance
function are used in the computation of persistent homology -- a backbone of
the topological data analysis pipeline. Despite its stability to perturbations
in the Hausdorff distance, persistent homology is highly sensitive to outliers.
In this work, we develop a framework of statistical inference for persistent
homology in the presence of outliers. Drawing inspiration from recent
developments in robust statistics, we propose a $\textit{median-of-means}$
variant of the distance function ($\textsf{MoM Dist}$), and establish its
statistical properties. In particular, we show that, even in the presence of
outliers, the sublevel filtrations and weighted filtrations induced by
$\textsf{MoM Dist}$ are both consistent estimators of the true underlying
population counterpart, and their rates of convergence in the bottleneck metric
are controlled by the fraction of outliers in the data. Finally, we demonstrate
the advantages of the proposed methodology through simulations and
applications.
Related papers
- Statistical Inference for Temporal Difference Learning with Linear Function Approximation [62.69448336714418]
Temporal Difference (TD) learning, arguably the most widely used for policy evaluation, serves as a natural framework for this purpose.
In this paper, we study the consistency properties of TD learning with Polyak-Ruppert averaging and linear function approximation, and obtain three significant improvements over existing results.
arXiv Detail & Related papers (2024-10-21T15:34:44Z) - Persistent Classification: A New Approach to Stability of Data and Adversarial Examples [6.469716438197741]
We study the differences between persistence metrics along interpolants of natural and adversarial points.
We show that adversarial examples have significantly lower persistence than natural examples for large neural networks.
We connect this lack of persistence with decision boundary geometry by measuring angles of interpolants with respect to decision boundaries.
arXiv Detail & Related papers (2024-04-11T18:13:42Z) - Learn2Extend: Extending sequences by retaining their statistical
properties with mixture models [7.15769102504304]
This paper addresses the challenge of extending general finite sequences of real numbers within a subinterval of the real line.
Our focus lies on preserving the gap distribution and pair correlation function of these point sets.
Leveraging advancements in deep learning applied to point processes, this paper explores the use of an auto-regressive textitSequence Extension Mixture Model.
arXiv Detail & Related papers (2023-12-03T21:05:50Z) - Non-isotropic Persistent Homology: Leveraging the Metric Dependency of
PH [5.70896453969985]
We show that information on the point cloud is lost when restricting persistent homology to a single distance function.
We numerically show that non-isotropic persistent homology can extract information on orientation, orientational variance, and scaling of randomly generated point clouds.
arXiv Detail & Related papers (2023-10-25T08:03:17Z) - Conformal inference for regression on Riemannian Manifolds [49.7719149179179]
We investigate prediction sets for regression scenarios when the response variable, denoted by $Y$, resides in a manifold, and the covariable, denoted by X, lies in Euclidean space.
We prove the almost sure convergence of the empirical version of these regions on the manifold to their population counterparts.
arXiv Detail & Related papers (2023-10-12T10:56:25Z) - Online Statistical Inference for Nonlinear Stochastic Approximation with
Markovian Data [22.59079286063505]
We study the statistical inference of nonlinear approximation algorithms utilizing a single trajectory of Markovian data.
Our methodology has practical applications in various scenarios, such as Gradient Descent (SGD) on autoregressive data and asynchronous Q-Learning.
arXiv Detail & Related papers (2023-02-15T14:31:11Z) - Efficient CDF Approximations for Normalizing Flows [64.60846767084877]
We build upon the diffeomorphic properties of normalizing flows to estimate the cumulative distribution function (CDF) over a closed region.
Our experiments on popular flow architectures and UCI datasets show a marked improvement in sample efficiency as compared to traditional estimators.
arXiv Detail & Related papers (2022-02-23T06:11:49Z) - Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples.
We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z) - $\beta$-Cores: Robust Large-Scale Bayesian Data Summarization in the
Presence of Outliers [14.918826474979587]
The quality of classic Bayesian inference depends critically on whether observations conform with the assumed data generating model.
We propose a variational inference method that, in a principled way, can simultaneously scale to large datasets.
We illustrate the applicability of our approach in diverse simulated and real datasets, and various statistical models.
arXiv Detail & Related papers (2020-08-31T13:47:12Z) - $\gamma$-ABC: Outlier-Robust Approximate Bayesian Computation Based on a
Robust Divergence Estimator [95.71091446753414]
We propose to use a nearest-neighbor-based $gamma$-divergence estimator as a data discrepancy measure.
Our method achieves significantly higher robustness than existing discrepancy measures.
arXiv Detail & Related papers (2020-06-13T06:09:27Z) - Distributional Robustness and Regularization in Reinforcement Learning [62.23012916708608]
We introduce a new regularizer for empirical value functions and show that it lower bounds the Wasserstein distributionally robust value function.
It suggests using regularization as a practical tool for dealing with $textitexternal uncertainty$ in reinforcement learning.
arXiv Detail & Related papers (2020-03-05T19:56:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.