Related papers: Bringing UMAP Closer to the Speed of Light with GPU Acceleration

Bringing UMAP Closer to the Speed of Light with GPU Acceleration

URL: http://arxiv.org/abs/2008.00325v3
Date: Mon, 29 Mar 2021 09:15:12 GMT
Title: Bringing UMAP Closer to the Speed of Light with GPU Acceleration
Authors: Corey J. Nolet, Victor Lafargue, Edward Raff, Thejaswi Nanditale, Tim Oates, John Zedlewski, Joshua Patterson
Abstract summary: We show a number of techniques that can be used to make a faster and more faithful GPU version of UMAP. Many of these design choices/lessons are general purpose and may inform the conversion of other graph and manifold learning algorithms to use GPU.
Score: 28.64858826371568
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The Uniform Manifold Approximation and Projection (UMAP) algorithm has become widely popular for its ease of use, quality of results, and support for exploratory, unsupervised, supervised, and semi-supervised learning. While many algorithms can be ported to a GPU in a simple and direct fashion, such efforts have resulted in inefficient and inaccurate versions of UMAP. We show a number of techniques that can be used to make a faster and more faithful GPU version of UMAP, and obtain speedups of up to 100x in practice. Many of these design choices/lessons are general purpose and may inform the conversion of other graph and manifold learning algorithms to use GPUs. Our implementation has been made publicly available as part of the open source RAPIDS cuML library (https://github.com/rapidsai/cuml).

Related papers

Ramp Up NTT in Record Time using GPU-Accelerated Algorithms and LLM-based Code Generation [11.120838175165986]
Homomorphic encryption (HE) is a core building block in privacy-preserving machine learning (PPML) Many GPU-accelerated cryptographic schemes have been proposed to improve the performance of HE. Given the powerful code generation capabilities of large language models (LLMs), we aim to explore their potential to automatically generate practical GPU-friendly algorithm code.
arXiv Detail & Related papers (2025-02-16T12:53:23Z)
JaxMARL: Multi-Agent RL Environments and Algorithms in JAX [105.343918678781]
We present JaxMARL, the first open-source, Python-based library that combines GPU-enabled efficiency with support for a large number of commonly used MARL environments. Our experiments show that, in terms of wall clock time, our JAX-based training pipeline is around 14 times faster than existing approaches. We also introduce and benchmark SMAX, a JAX-based approximate reimplementation of the popular StarCraft Multi-Agent Challenge.
arXiv Detail & Related papers (2023-11-16T18:58:43Z)
A modular software framework for the design and implementation of ptychography algorithms [55.41644538483948]
We present SciCom, a new ptychography software framework aiming at simulating ptychography datasets and testing state-of-the-art reconstruction algorithms. Despite its simplicity, the software leverages accelerated processing through the PyTorch interface. Results are shown on both synthetic and real datasets.
arXiv Detail & Related papers (2022-05-06T16:32:37Z)
GPU-accelerated Faster Mean Shift with euclidean distance metrics [1.3507758562554621]
Mean-shift algorithm is widely used to solve clustering problems. In previous research, we proposed a novel GPU-accelerated Faster Mean-shift algorithm. In this study, we extend and improve the previous algorithm to handle Euclidean distance metrics.
arXiv Detail & Related papers (2021-12-27T20:18:24Z)
Adaptive Elastic Training for Sparse Deep Learning on Heterogeneous Multi-GPU Servers [65.60007071024629]
We show that Adaptive SGD outperforms four state-of-the-art solutions in time-to-accuracy. We show experimentally that Adaptive SGD outperforms four state-of-the-art solutions in time-to-accuracy.
arXiv Detail & Related papers (2021-10-13T20:58:15Z)
Providing Meaningful Data Summarizations Using Examplar-based Clustering in Industry 4.0 [67.80123919697971]
We show, that our GPU implementation provides speedups of up to 72x using single-precision and up to 452x using half-precision compared to conventional CPU algorithms. We apply our algorithm to real-world data from injection molding manufacturing processes and discuss how found summaries help with steering this specific process to cut costs and reduce the manufacturing of bad parts.
arXiv Detail & Related papers (2021-05-25T15:55:14Z)
Semiring Primitives for Sparse Neighborhood Methods on the GPU [16.56995698312561]
We show that a sparse semiring primitive can be flexible enough to support a wide range of critical distance measures. This primitive is a foundational component for enabling many neighborhood-based information retrieval and machine learning algorithms to accept sparse input.
arXiv Detail & Related papers (2021-04-13T17:05:03Z)
Kernel methods through the roof: handling billions of points efficiently [94.31450736250918]
Kernel methods provide an elegant and principled approach to nonparametric learning, but so far could hardly be used in large scale problems. Recent advances have shown the benefits of a number of algorithmic ideas, for example combining optimization, numerical linear algebra and random projections. Here, we push these efforts further to develop and test a solver that takes full advantage of GPU hardware.
arXiv Detail & Related papers (2020-06-18T08:16:25Z)
Out-of-Core GPU Gradient Boosting [0.0]
We show that much larger datasets can fit on a given GPU, without degrading model accuracy or training time. This is the first out-of-core GPU implementation of gradient boosting.
arXiv Detail & Related papers (2020-05-19T00:41:00Z)
Heterogeneous CPU+GPU Stochastic Gradient Descent Algorithms [1.3249453757295084]
We study training algorithms for deep learning on heterogeneous CPU+GPU architectures. Our two-fold objective -- maximize convergence rate and resource utilization simultaneously -- makes the problem challenging. We show that the implementation of these algorithms achieves both faster convergence and higher resource utilization than on several real datasets.
arXiv Detail & Related papers (2020-04-19T05:21:20Z)
MPLP++: Fast, Parallel Dual Block-Coordinate Ascent for Dense Graphical Models [96.1052289276254]
This work introduces a new MAP-solver, based on the popular Dual Block-Coordinate Ascent principle. Surprisingly, by making a small change to the low-performing solver, we derive the new solver MPLP++ that significantly outperforms all existing solvers by a large margin.
arXiv Detail & Related papers (2020-04-16T16:20:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.