Manifold learning with approximate nearest neighbors
- URL: http://arxiv.org/abs/2103.11773v1
- Date: Mon, 22 Feb 2021 12:04:23 GMT
- Title: Manifold learning with approximate nearest neighbors
- Authors: Fan Cheng, Rob J Hyndman, Anastasios Panagiotelis
- Abstract summary: We use a broad range of approximate nearest neighbor algorithms within manifold learning algorithms and evaluate their impact on embedding accuracy.
Via a thorough empirical investigation based on the benchmark MNIST dataset, it is shown that approximate nearest neighbors lead to substantial improvements in computational time.
This application demonstrates how the proposed methods can be used to visualize and identify anomalies and uncover underlying structure within high-dimensional data.
- Score: 1.8477401359673706
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Manifold learning algorithms are valuable tools for the analysis of
high-dimensional data, many of which include a step where nearest neighbors of
all observations are found. This can present a computational bottleneck when
the number of observations is large or when the observations lie in more
general metric spaces, such as statistical manifolds, which require all
pairwise distances between observations to be computed. We resolve this problem
by using a broad range of approximate nearest neighbor algorithms within
manifold learning algorithms and evaluating their impact on embedding accuracy.
We use approximate nearest neighbors for statistical manifolds by exploiting
the connection between Hellinger/Total variation distance for discrete
distributions and the L2/L1 norm. Via a thorough empirical investigation based
on the benchmark MNIST dataset, it is shown that approximate nearest neighbors
lead to substantial improvements in computational time with little to no loss
in the accuracy of the embedding produced by a manifold learning algorithm.
This result is robust to the use of different manifold learning algorithms, to
the use of different approximate nearest neighbor algorithms, and to the use of
different measures of embedding accuracy. The proposed method is applied to
learning statistical manifolds data on distributions of electricity usage. This
application demonstrates how the proposed methods can be used to visualize and
identify anomalies and uncover underlying structure within high-dimensional
data in a way that is scalable to large datasets.
Related papers
- Piecewise-Linear Manifolds for Deep Metric Learning [8.670873561640903]
Unsupervised deep metric learning focuses on learning a semantic representation space using only unlabeled data.
We propose to model the high-dimensional data manifold using a piecewise-linear approximation, with each low-dimensional linear piece approximating the data manifold in a small neighborhood of a point.
We empirically show that this similarity estimate correlates better with the ground truth than the similarity estimates of current state-of-the-art techniques.
arXiv Detail & Related papers (2024-03-22T06:22:20Z) - Minimally Supervised Learning using Topological Projections in
Self-Organizing Maps [55.31182147885694]
We introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs)
Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU)
Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques.
arXiv Detail & Related papers (2024-01-12T22:51:48Z) - Iterative Methods for Vecchia-Laplace Approximations for Latent Gaussian Process Models [11.141688859736805]
We introduce and analyze several preconditioners, derive new convergence results, and propose novel methods for accurately approxing predictive variances.
In particular, we obtain a speed-up of an order of magnitude compared to Cholesky-based calculations.
All methods are implemented in a free C++ software library with high-level Python and R packages.
arXiv Detail & Related papers (2023-10-18T14:31:16Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - Linearized Wasserstein dimensionality reduction with approximation
guarantees [65.16758672591365]
LOT Wassmap is a computationally feasible algorithm to uncover low-dimensional structures in the Wasserstein space.
We show that LOT Wassmap attains correct embeddings and that the quality improves with increased sample size.
We also show how LOT Wassmap significantly reduces the computational cost when compared to algorithms that depend on pairwise distance computations.
arXiv Detail & Related papers (2023-02-14T22:12:16Z) - Posterior and Computational Uncertainty in Gaussian Processes [52.26904059556759]
Gaussian processes scale prohibitively with the size of the dataset.
Many approximation methods have been developed, which inevitably introduce approximation error.
This additional source of uncertainty, due to limited computation, is entirely ignored when using the approximate posterior.
We develop a new class of methods that provides consistent estimation of the combined uncertainty arising from both the finite number of data observed and the finite amount of computation expended.
arXiv Detail & Related papers (2022-05-30T22:16:25Z) - Riemannian classification of EEG signals with missing values [67.90148548467762]
This paper proposes two strategies to handle missing data for the classification of electroencephalograms.
The first approach estimates the covariance from imputed data with the $k$-nearest neighbors algorithm; the second relies on the observed data by leveraging the observed-data likelihood within an expectation-maximization algorithm.
As results show, the proposed strategies perform better than the classification based on observed data and allow to keep a high accuracy even when the missing data ratio increases.
arXiv Detail & Related papers (2021-10-19T14:24:50Z) - Estimating leverage scores via rank revealing methods and randomization [50.591267188664666]
We study algorithms for estimating the statistical leverage scores of rectangular dense or sparse matrices of arbitrary rank.
Our approach is based on combining rank revealing methods with compositions of dense and sparse randomized dimensionality reduction transforms.
arXiv Detail & Related papers (2021-05-23T19:21:55Z) - Sparse Algorithms for Markovian Gaussian Processes [18.999495374836584]
Sparse Markovian processes combine the use of inducing variables with efficient Kalman filter-likes recursion.
We derive a general site-based approach to approximate the non-Gaussian likelihood with local Gaussian terms, called sites.
Our approach results in a suite of novel sparse extensions to algorithms from both the machine learning and signal processing, including variational inference, expectation propagation, and the classical nonlinear Kalman smoothers.
The derived methods are suited to literature-temporal data, where the model has separate inducing points in both time and space.
arXiv Detail & Related papers (2021-03-19T09:50:53Z) - Scalable Distributed Approximation of Internal Measures for Clustering
Evaluation [5.144809478361603]
Internal measure for clustering evaluation is the silhouette coefficient, whose computation requires a quadratic number of distance calculations.
We present the first scalable algorithm to compute such a rigorous approximation for the evaluation of clusterings based on any metric distances.
We also prove that the algorithm can be adapted to obtain rigorous approximations of other internal measures of clustering quality, such as cohesion and separation.
arXiv Detail & Related papers (2020-03-03T10:28:14Z) - Statistical Outlier Identification in Multi-robot Visual SLAM using
Expectation Maximization [18.259478519717426]
This paper introduces a novel and distributed method for detecting inter-map loop closure outliers in simultaneous localization and mapping (SLAM)
The proposed algorithm does not rely on a good initialization and can handle more than two maps at a time.
arXiv Detail & Related papers (2020-02-07T06:34:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.