Accelerating Barnes-Hut t-SNE Algorithm by Efficient Parallelization on
Multi-Core CPUs
- URL: http://arxiv.org/abs/2212.11506v1
- Date: Thu, 22 Dec 2022 06:38:40 GMT
- Title: Accelerating Barnes-Hut t-SNE Algorithm by Efficient Parallelization on
Multi-Core CPUs
- Authors: Narendra Chaudhary, Alexander Pivovar, Pavel Yakovlev, Andrey
Gorshkov, Sanchit Misra
- Abstract summary: t-SNE remains one of the most popular embedding techniques for visualizing high-dimensional data.
BH t-SNE algorithm is inefficient on existing CPU implementations.
Acc-t-SNE is up to 261x and 4x faster than scikit-learn and the state-of-the-art BH t-SNE implementation from daal4py.
- Score: 59.18990342943095
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: t-SNE remains one of the most popular embedding techniques for visualizing
high-dimensional data. Most standard packages of t-SNE, such as scikit-learn,
use the Barnes-Hut t-SNE (BH t-SNE) algorithm for large datasets. However,
existing CPU implementations of this algorithm are inefficient. In this work,
we accelerate the BH t-SNE on CPUs via cache optimizations, SIMD, parallelizing
sequential steps, and improving parallelization of multithreaded steps. Our
implementation (Acc-t-SNE) is up to 261x and 4x faster than scikit-learn and
the state-of-the-art BH t-SNE implementation from daal4py, respectively, on a
32-core Intel(R) Icelake cloud instance.
Related papers
- Fast, Scalable, Warm-Start Semidefinite Programming with Spectral
Bundling and Sketching [53.91395791840179]
We present Unified Spectral Bundling with Sketching (USBS), a provably correct, fast and scalable algorithm for solving massive SDPs.
USBS provides a 500x speed-up over the state-of-the-art scalable SDP solver on an instance with over 2 billion decision variables.
arXiv Detail & Related papers (2023-12-19T02:27:22Z) - PECANN: Parallel Efficient Clustering with Graph-Based Approximate
Nearest Neighbor Search [8.15681999722805]
This paper studies density-based clustering of point sets.
It unifies the different variants of density peaks clustering into a single framework, PECANN.
We implement five clustering algorithms with PECANN and evaluate them on synthetic and real-world datasets with up to 1.28 million points and up to 1024 dimensions on a 30-core machine with two-way hyper-threading.
arXiv Detail & Related papers (2023-12-06T22:43:50Z) - A Computationally Efficient Sparsified Online Newton Method [48.78646010774149]
Sparsified Online Newton (SONew) is a memory-efficient second-order algorithm that yields a sparsified yet effective preconditioner.
We achieve up to 30% faster convergence, 3.4% relative improvement in validation, and 80% relative improvement in training loss.
arXiv Detail & Related papers (2023-11-16T18:44:22Z) - Massively Parallel Continuous Local Search for Hybrid SAT Solving on
GPUs [5.245714076090567]
We propose FastFourierSAT, a highly parallel hybrid SAT solver based on gradient-driven continuous local search.
Our results show that FastFourierSAT computes the gradient 100+ times faster than previous prototypes implemented on CPU.
FastFourierSAT solves most instances and demonstrates promising performance on larger-size instances.
arXiv Detail & Related papers (2023-08-29T04:50:07Z) - INR-Arch: A Dataflow Architecture and Compiler for Arbitrary-Order
Gradient Computations in Implicit Neural Representation Processing [66.00729477511219]
Given a function represented as a computation graph, traditional architectures face challenges in efficiently computing its nth-order gradient.
We introduce INR-Arch, a framework that transforms the computation graph of an nth-order gradient into a hardware-optimized dataflow architecture.
We present results that demonstrate 1.8-4.8x and 1.5-3.6x speedup compared to CPU and GPU baselines respectively.
arXiv Detail & Related papers (2023-08-11T04:24:39Z) - Tensor Slicing and Optimization for Multicore NPUs [2.670309629218727]
This paper proposes a compiler optimization pass for Multicore NPUs, called Slicing Optimization (TSO)
TSO identifies the best tensor slicing that minimizes execution time for a set of CNN models.
Results show that TSO is capable of identifying the best tensor slicing that minimizes execution time for a set of CNN models.
arXiv Detail & Related papers (2023-04-06T12:03:03Z) - Fast Parallel Bayesian Network Structure Learning [37.46185698921754]
We propose a fast solution named Fast-BNS on multi-core CPUs to enhance the efficiency of the BN structure learning.
Fast-BNS is powered by a series of efficiency optimizations including grouping the CI tests of the edges with the same endpoints to reduce the number of unnecessary CI tests.
A comprehensive experimental study shows that the sequential version of Fast-BNS is up to 50 times faster than its counterpart.
arXiv Detail & Related papers (2022-12-08T13:17:02Z) - Asymmetric Scalable Cross-modal Hashing [51.309905690367835]
Cross-modal hashing is a successful method to solve large-scale multimedia retrieval issue.
We propose a novel Asymmetric Scalable Cross-Modal Hashing (ASCMH) to address these issues.
Our ASCMH outperforms the state-of-the-art cross-modal hashing methods in terms of accuracy and efficiency.
arXiv Detail & Related papers (2022-07-26T04:38:47Z) - Distributed Out-of-Memory NMF on CPU/GPU Architectures [1.0051474951635875]
We propose an efficient out-of-memory implementation of the Non-negative Matrix Factorization (NMF) algorithm for HPC systems.
Benchmark results show significant improvement of 32X to 76x speedup with the new implementation using GPU over the CPU-based NMFk.
arXiv Detail & Related papers (2022-02-19T03:49:21Z) - Kernel methods through the roof: handling billions of points efficiently [94.31450736250918]
Kernel methods provide an elegant and principled approach to nonparametric learning, but so far could hardly be used in large scale problems.
Recent advances have shown the benefits of a number of algorithmic ideas, for example combining optimization, numerical linear algebra and random projections.
Here, we push these efforts further to develop and test a solver that takes full advantage of GPU hardware.
arXiv Detail & Related papers (2020-06-18T08:16:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.