Related papers: Hierarchical Nearest Neighbor Graph Embedding for Efficient Dimensionality Reduction

Hierarchical Nearest Neighbor Graph Embedding for Efficient Dimensionality Reduction

URL: http://arxiv.org/abs/2203.12997v1
Date: Thu, 24 Mar 2022 11:41:16 GMT
Title: Hierarchical Nearest Neighbor Graph Embedding for Efficient Dimensionality Reduction
Authors: M. Saquib Sarfraz, Marios Koulakis, Constantin Seibold, Rainer Stiefelhagen
Abstract summary: We introduce a novel method based on a hierarchy built on 1-nearest neighbor graphs in the original space. The proposal is an optimization-free projection that is competitive with the latest versions of t-SNE and UMAP. In the paper, we argue about the soundness of the proposed method and evaluate it on a diverse collection of datasets with sizes varying from 1K to 11M samples and dimensions from 28 to 16K.
Score: 25.67957712837716
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Dimensionality reduction is crucial both for visualization and preprocessing high dimensional data for machine learning. We introduce a novel method based on a hierarchy built on 1-nearest neighbor graphs in the original space which is used to preserve the grouping properties of the data distribution on multiple levels. The core of the proposal is an optimization-free projection that is competitive with the latest versions of t-SNE and UMAP in performance and visualization quality while being an order of magnitude faster in run-time. Furthermore, its interpretable mechanics, the ability to project new data, and the natural separation of data clusters in visualizations make it a general purpose unsupervised dimension reduction technique. In the paper, we argue about the soundness of the proposed method and evaluate it on a diverse collection of datasets with sizes varying from 1K to 11M samples and dimensions from 28 to 16K. We perform comparisons with other state-of-the-art methods on multiple metrics and target dimensions highlighting its efficiency and performance. Code is available at https://github.com/koulakis/h-nne

Related papers

An Enhanced Model-based Approach for Short Text Clustering [58.60681789677676]
Short text clustering has become increasingly important with the popularity of social media like Twitter, Google+, and Facebook.<n>Existing methods can be broadly categorized into two paradigms: topic model-based approaches and deep representation learning-based approaches.<n>We propose a collapsed Gibbs Sampling algorithm for the Dirichlet Multinomial Mixture model (GSDMM), which effectively handles the sparsity and high dimensionality of short texts.<n>Based on several aspects of GSDMM that warrant further refinement, we propose an improved approach, GSDMM+, designed to further optimize its performance.
arXiv Detail & Related papers (2025-07-18T10:07:42Z)
CMET: Clustering guided METric for quantifying embedding quality [0.0]
Clustering guided METric (CMET) is a metric for quantifying embedding quality.<n>CMET consists of two scores, viz., CMET_L and CMET_G, that measure the degree of local and global shape preservation capability.<n>Results reflect the favorable performance of CMET against the state-of-the-art methods.
arXiv Detail & Related papers (2025-07-07T10:02:34Z)
Fast and Scalable Semi-Supervised Learning for Multi-View Subspace Clustering [13.638434337947302]
FSSMSC is a novel solution to the high computational complexity commonly found in existing approaches. The method generates a consensus anchor graph across all views, representing each data point as a sparse linear combination of chosen landmarks. The effectiveness and efficiency of FSSMSC are validated through extensive experiments on multiple benchmark datasets of varying scales.
arXiv Detail & Related papers (2024-08-11T06:54:00Z)
Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein [56.62376364594194]
Unsupervised learning aims to capture the underlying structure of potentially large and high-dimensional datasets. In this work, we revisit these approaches under the lens of optimal transport and exhibit relationships with the Gromov-Wasserstein problem. This unveils a new general framework, called distributional reduction, that recovers DR and clustering as special cases and allows addressing them jointly within a single optimization problem.
arXiv Detail & Related papers (2024-02-03T19:00:19Z)
Joint Projection Learning and Tensor Decomposition Based Incomplete Multi-view Clustering [21.925066554821168]
We propose a novel Joint Projection and Decomposition Based method (JPLTD) for incomplete multi-view clustering. JPLTD alleviates the influence of redundant features and noise in high-dimensional data. Experiments on several benchmark datasets demonstrate that JPLTD outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2023-10-06T06:19:16Z)
Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching. Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z)
In search of the most efficient and memory-saving visualization of high dimensional data [0.0]
We argue that the visualization of multidimensional data is well approximated of two-directed embedding of undimensional nearest-neighbor graphs. Existing reduction methods are too slow and do not allow interactive manipulation. We show that high-quality embeddings are produced with minimal time and memory complexity.
arXiv Detail & Related papers (2023-02-27T20:56:13Z)
Index $t$-SNE: Tracking Dynamics of High-Dimensional Datasets with Coherent Embeddings [1.7188280334580195]
This paper presents a methodology to reuse an embedding to create a new one, where cluster positions are preserved. The proposed algorithm has the same complexity as the original $t$-SNE to embed new items, and a lower one when considering the embedding of a dataset sliced into sub-pieces.
arXiv Detail & Related papers (2021-09-22T06:45:37Z)
Manifold Topology Divergence: a Framework for Comparing Data Manifolds [109.0784952256104]
We develop a framework for comparing data manifold, aimed at the evaluation of deep generative models. Based on the Cross-Barcode, we introduce the Manifold Topology Divergence score (MTop-Divergence) We demonstrate that the MTop-Divergence accurately detects various degrees of mode-dropping, intra-mode collapse, mode invention, and image disturbance.
arXiv Detail & Related papers (2021-06-08T00:30:43Z)
Auto-weighted Multi-view Feature Selection with Graph Optimization [90.26124046530319]
We propose a novel unsupervised multi-view feature selection model based on graph learning. The contributions are threefold: (1) during the feature selection procedure, the consensus similarity graph shared by different views is learned. Experiments on various datasets demonstrate the superiority of the proposed method compared with the state-of-the-art methods.
arXiv Detail & Related papers (2021-04-11T03:25:25Z)
A Local Similarity-Preserving Framework for Nonlinear Dimensionality Reduction with Neural Networks [56.068488417457935]
We propose a novel local nonlinear approach named Vec2vec for general purpose dimensionality reduction. To train the neural network, we build the neighborhood similarity graph of a matrix and define the context of data points. Experiments of data classification and clustering on eight real datasets show that Vec2vec is better than several classical dimensionality reduction methods in the statistical hypothesis test.
arXiv Detail & Related papers (2021-03-10T23:10:47Z)
Learning a Deep Part-based Representation by Preserving Data Distribution [21.13421736154956]
Unsupervised dimensionality reduction is one of the commonly used techniques in the field of high dimensional data recognition problems. In this paper, by preserving the data distribution, a deep part-based representation can be learned, and the novel algorithm is called Distribution Preserving Network Embedding. The experimental results on the real-world data sets show that the proposed algorithm has good performance in terms of cluster accuracy and AMI.
arXiv Detail & Related papers (2020-09-17T12:49:36Z)
NCVis: Noise Contrastive Approach for Scalable Visualization [79.44177623781043]
NCVis is a high-performance dimensionality reduction method built on a sound statistical basis of noise contrastive estimation. We show that NCVis outperforms state-of-the-art techniques in terms of speed while preserving the representation quality of other methods.
arXiv Detail & Related papers (2020-01-30T15:43:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.