GPU-Accelerated Optimizer-Aware Evaluation of Submodular Exemplar
Clustering
- URL: http://arxiv.org/abs/2101.08763v1
- Date: Thu, 21 Jan 2021 18:23:44 GMT
- Title: GPU-Accelerated Optimizer-Aware Evaluation of Submodular Exemplar
Clustering
- Authors: Philipp-Jan Honysz, Sebastian Buschj\"ager, Katharina Morik
- Abstract summary: optimization of submodular functions constitutes a viable way to perform clustering.
Strong approximation guarantees and feasible optimization w.r.t. streaming data make this clustering approach favorable.
Exemplar-based clustering is one of the possible submodular functions, but suffers from high computational complexity.
Half-precision GPU computation led to large speedups of up to 452x compared to single-precision, single-thread CPU computations.
- Score: 5.897728689802829
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The optimization of submodular functions constitutes a viable way to perform
clustering. Strong approximation guarantees and feasible optimization w.r.t.
streaming data make this clustering approach favorable. Technically, submodular
functions map subsets of data to real values, which indicate how
"representative" a specific subset is. Optimal sets might then be used to
partition the data space and to infer clusters. Exemplar-based clustering is
one of the possible submodular functions, but suffers from high computational
complexity. However, for practical applications, the particular real-time or
wall-clock run-time is decisive. In this work, we present a novel way to
evaluate this particular function on GPUs, which keeps the necessities of
optimizers in mind and reduces wall-clock run-time. To discuss our GPU
algorithm, we investigated both the impact of different run-time critical
problem properties, like data dimensionality and the number of data points in a
subset, and the influence of required floating-point precision. In reproducible
experiments, our GPU algorithm was able to achieve competitive speedups of up
to 72x depending on whether multi-threaded computation on CPUs was used for
comparison and the type of floating-point precision required. Half-precision
GPU computation led to large speedups of up to 452x compared to
single-precision, single-thread CPU computations.
Related papers
- Implementation and Analysis of GPU Algorithms for Vecchia Approximation [0.8057006406834466]
Vecchia Approximation is widely used to reduce the computational complexity and can be calculated with embarrassingly parallel algorithms.
While multi-core software has been developed for Vecchia Approximation, software designed to run on graphics processing units ( GPU) is lacking.
We show that our new method outperforms the other two and then present it in the GpGpU R package.
arXiv Detail & Related papers (2024-07-03T01:24:44Z) - Efficient Dataset Distillation Using Random Feature Approximation [109.07737733329019]
We propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel.
Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU.
Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets.
arXiv Detail & Related papers (2022-10-21T15:56:13Z) - AdaPool: Exponential Adaptive Pooling for Information-Retaining
Downsampling [82.08631594071656]
Pooling layers are essential building blocks of Convolutional Neural Networks (CNNs)
We propose an adaptive and exponentially weighted pooling method named adaPool.
We demonstrate how adaPool improves the preservation of detail through a range of tasks including image and video classification and object detection.
arXiv Detail & Related papers (2021-11-01T08:50:37Z) - Adaptive Elastic Training for Sparse Deep Learning on Heterogeneous
Multi-GPU Servers [65.60007071024629]
We show that Adaptive SGD outperforms four state-of-the-art solutions in time-to-accuracy.
We show experimentally that Adaptive SGD outperforms four state-of-the-art solutions in time-to-accuracy.
arXiv Detail & Related papers (2021-10-13T20:58:15Z) - Providing Meaningful Data Summarizations Using Examplar-based Clustering
in Industry 4.0 [67.80123919697971]
We show, that our GPU implementation provides speedups of up to 72x using single-precision and up to 452x using half-precision compared to conventional CPU algorithms.
We apply our algorithm to real-world data from injection molding manufacturing processes and discuss how found summaries help with steering this specific process to cut costs and reduce the manufacturing of bad parts.
arXiv Detail & Related papers (2021-05-25T15:55:14Z) - Kernel methods through the roof: handling billions of points efficiently [94.31450736250918]
Kernel methods provide an elegant and principled approach to nonparametric learning, but so far could hardly be used in large scale problems.
Recent advances have shown the benefits of a number of algorithmic ideas, for example combining optimization, numerical linear algebra and random projections.
Here, we push these efforts further to develop and test a solver that takes full advantage of GPU hardware.
arXiv Detail & Related papers (2020-06-18T08:16:25Z) - Heterogeneous CPU+GPU Stochastic Gradient Descent Algorithms [1.3249453757295084]
We study training algorithms for deep learning on heterogeneous CPU+GPU architectures.
Our two-fold objective -- maximize convergence rate and resource utilization simultaneously -- makes the problem challenging.
We show that the implementation of these algorithms achieves both faster convergence and higher resource utilization than on several real datasets.
arXiv Detail & Related papers (2020-04-19T05:21:20Z) - FarSee-Net: Real-Time Semantic Segmentation by Efficient Multi-scale
Context Aggregation and Feature Space Super-resolution [14.226301825772174]
We introduce a novel and efficient module called Cascaded Factorized Atrous Spatial Pyramid Pooling (CF-ASPP)
It is a lightweight cascaded structure for Convolutional Neural Networks (CNNs) to efficiently leverage context information.
We achieve 68.4% mIoU at 84 fps on the Cityscapes test set with a single Nivida Titan X (Maxwell) GPU card.
arXiv Detail & Related papers (2020-03-09T03:53:57Z) - Efficient Video Semantic Segmentation with Labels Propagation and
Refinement [138.55845680523908]
This paper tackles the problem of real-time semantic segmentation of high definition videos using a hybrid GPU / CPU approach.
We propose an Efficient Video(EVS) pipeline that combines: (i) On the CPU, a very fast optical flow method, that is used to exploit the temporal aspect of the video and propagate semantic information from one frame to the next.
On the popular Cityscapes dataset with high resolution frames (2048 x 1024), the proposed operating points range from 80 to 1000 Hz on a single GPU and CPU.
arXiv Detail & Related papers (2019-12-26T11:45:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.