Hierarchical Nearest Neighbor Graph Embedding for Efficient
Dimensionality Reduction
- URL: http://arxiv.org/abs/2203.12997v1
- Date: Thu, 24 Mar 2022 11:41:16 GMT
- Title: Hierarchical Nearest Neighbor Graph Embedding for Efficient
Dimensionality Reduction
- Authors: M. Saquib Sarfraz, Marios Koulakis, Constantin Seibold, Rainer
Stiefelhagen
- Abstract summary: We introduce a novel method based on a hierarchy built on 1-nearest neighbor graphs in the original space.
The proposal is an optimization-free projection that is competitive with the latest versions of t-SNE and UMAP.
In the paper, we argue about the soundness of the proposed method and evaluate it on a diverse collection of datasets with sizes varying from 1K to 11M samples and dimensions from 28 to 16K.
- Score: 25.67957712837716
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dimensionality reduction is crucial both for visualization and preprocessing
high dimensional data for machine learning. We introduce a novel method based
on a hierarchy built on 1-nearest neighbor graphs in the original space which
is used to preserve the grouping properties of the data distribution on
multiple levels. The core of the proposal is an optimization-free projection
that is competitive with the latest versions of t-SNE and UMAP in performance
and visualization quality while being an order of magnitude faster in run-time.
Furthermore, its interpretable mechanics, the ability to project new data, and
the natural separation of data clusters in visualizations make it a general
purpose unsupervised dimension reduction technique. In the paper, we argue
about the soundness of the proposed method and evaluate it on a diverse
collection of datasets with sizes varying from 1K to 11M samples and dimensions
from 28 to 16K. We perform comparisons with other state-of-the-art methods on
multiple metrics and target dimensions highlighting its efficiency and
performance. Code is available at https://github.com/koulakis/h-nne
Related papers
- Fast and Scalable Semi-Supervised Learning for Multi-View Subspace Clustering [13.638434337947302]
FSSMSC is a novel solution to the high computational complexity commonly found in existing approaches.
The method generates a consensus anchor graph across all views, representing each data point as a sparse linear combination of chosen landmarks.
The effectiveness and efficiency of FSSMSC are validated through extensive experiments on multiple benchmark datasets of varying scales.
arXiv Detail & Related papers (2024-08-11T06:54:00Z) - Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein [56.62376364594194]
Unsupervised learning aims to capture the underlying structure of potentially large and high-dimensional datasets.
In this work, we revisit these approaches under the lens of optimal transport and exhibit relationships with the Gromov-Wasserstein problem.
This unveils a new general framework, called distributional reduction, that recovers DR and clustering as special cases and allows addressing them jointly within a single optimization problem.
arXiv Detail & Related papers (2024-02-03T19:00:19Z) - Joint Projection Learning and Tensor Decomposition Based Incomplete
Multi-view Clustering [21.925066554821168]
We propose a novel Joint Projection and Decomposition Based method (JPLTD) for incomplete multi-view clustering.
JPLTD alleviates the influence of redundant features and noise in high-dimensional data.
Experiments on several benchmark datasets demonstrate that JPLTD outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2023-10-06T06:19:16Z) - Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching.
Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z) - In search of the most efficient and memory-saving visualization of high
dimensional data [0.0]
We argue that the visualization of multidimensional data is well approximated of two-directed embedding of undimensional nearest-neighbor graphs.
Existing reduction methods are too slow and do not allow interactive manipulation.
We show that high-quality embeddings are produced with minimal time and memory complexity.
arXiv Detail & Related papers (2023-02-27T20:56:13Z) - Index $t$-SNE: Tracking Dynamics of High-Dimensional Datasets with
Coherent Embeddings [1.7188280334580195]
This paper presents a methodology to reuse an embedding to create a new one, where cluster positions are preserved.
The proposed algorithm has the same complexity as the original $t$-SNE to embed new items, and a lower one when considering the embedding of a dataset sliced into sub-pieces.
arXiv Detail & Related papers (2021-09-22T06:45:37Z) - Manifold Topology Divergence: a Framework for Comparing Data Manifolds [109.0784952256104]
We develop a framework for comparing data manifold, aimed at the evaluation of deep generative models.
Based on the Cross-Barcode, we introduce the Manifold Topology Divergence score (MTop-Divergence)
We demonstrate that the MTop-Divergence accurately detects various degrees of mode-dropping, intra-mode collapse, mode invention, and image disturbance.
arXiv Detail & Related papers (2021-06-08T00:30:43Z) - Auto-weighted Multi-view Feature Selection with Graph Optimization [90.26124046530319]
We propose a novel unsupervised multi-view feature selection model based on graph learning.
The contributions are threefold: (1) during the feature selection procedure, the consensus similarity graph shared by different views is learned.
Experiments on various datasets demonstrate the superiority of the proposed method compared with the state-of-the-art methods.
arXiv Detail & Related papers (2021-04-11T03:25:25Z) - A Local Similarity-Preserving Framework for Nonlinear Dimensionality
Reduction with Neural Networks [56.068488417457935]
We propose a novel local nonlinear approach named Vec2vec for general purpose dimensionality reduction.
To train the neural network, we build the neighborhood similarity graph of a matrix and define the context of data points.
Experiments of data classification and clustering on eight real datasets show that Vec2vec is better than several classical dimensionality reduction methods in the statistical hypothesis test.
arXiv Detail & Related papers (2021-03-10T23:10:47Z) - Learning a Deep Part-based Representation by Preserving Data
Distribution [21.13421736154956]
Unsupervised dimensionality reduction is one of the commonly used techniques in the field of high dimensional data recognition problems.
In this paper, by preserving the data distribution, a deep part-based representation can be learned, and the novel algorithm is called Distribution Preserving Network Embedding.
The experimental results on the real-world data sets show that the proposed algorithm has good performance in terms of cluster accuracy and AMI.
arXiv Detail & Related papers (2020-09-17T12:49:36Z) - NCVis: Noise Contrastive Approach for Scalable Visualization [79.44177623781043]
NCVis is a high-performance dimensionality reduction method built on a sound statistical basis of noise contrastive estimation.
We show that NCVis outperforms state-of-the-art techniques in terms of speed while preserving the representation quality of other methods.
arXiv Detail & Related papers (2020-01-30T15:43:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.