GPU-accelerated Faster Mean Shift with euclidean distance metrics
- URL: http://arxiv.org/abs/2112.13891v1
- Date: Mon, 27 Dec 2021 20:18:24 GMT
- Title: GPU-accelerated Faster Mean Shift with euclidean distance metrics
- Authors: Le You, Han Jiang, Jinyong Hu, Chorng Chang, Lingxi Chen, Xintong Cui,
Mengyang Zhao
- Abstract summary: Mean-shift algorithm is widely used to solve clustering problems.
In previous research, we proposed a novel GPU-accelerated Faster Mean-shift algorithm.
In this study, we extend and improve the previous algorithm to handle Euclidean distance metrics.
- Score: 1.3507758562554621
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Handling clustering problems are important in data statistics, pattern
recognition and image processing. The mean-shift algorithm, a common
unsupervised algorithms, is widely used to solve clustering problems. However,
the mean-shift algorithm is restricted by its huge computational resource cost.
In previous research[10], we proposed a novel GPU-accelerated Faster Mean-shift
algorithm, which greatly speed up the cosine-embedding clustering problem. In
this study, we extend and improve the previous algorithm to handle Euclidean
distance metrics. Different from conventional GPU-based mean-shift algorithms,
our algorithm adopts novel Seed Selection & Early Stopping approaches, which
greatly increase computing speed and reduce GPU memory consumption. In the
simulation testing, when processing a 200K points clustering problem, our
algorithm achieved around 3 times speedup compared to the state-of-the-art
GPU-based mean-shift algorithms with optimized GPU memory consumption.
Moreover, in this study, we implemented a plug-and-play model for faster
mean-shift algorithm, which can be easily deployed. (Plug-and-play model is
available: https://github.com/masqm/Faster-Mean-Shift-Euc)
Related papers
- Implementation and Analysis of GPU Algorithms for Vecchia Approximation [0.8057006406834466]
Vecchia Approximation is widely used to reduce the computational complexity and can be calculated with embarrassingly parallel algorithms.
While multi-core software has been developed for Vecchia Approximation, software designed to run on graphics processing units ( GPU) is lacking.
We show that our new method outperforms the other two and then present it in the GpGpU R package.
arXiv Detail & Related papers (2024-07-03T01:24:44Z) - Fast, Scalable, Warm-Start Semidefinite Programming with Spectral
Bundling and Sketching [53.91395791840179]
We present Unified Spectral Bundling with Sketching (USBS), a provably correct, fast and scalable algorithm for solving massive SDPs.
USBS provides a 500x speed-up over the state-of-the-art scalable SDP solver on an instance with over 2 billion decision variables.
arXiv Detail & Related papers (2023-12-19T02:27:22Z) - An Efficient Algorithm for Clustered Multi-Task Compressive Sensing [60.70532293880842]
Clustered multi-task compressive sensing is a hierarchical model that solves multiple compressive sensing tasks.
The existing inference algorithm for this model is computationally expensive and does not scale well in high dimensions.
We propose a new algorithm that substantially accelerates model inference by avoiding the need to explicitly compute these covariance matrices.
arXiv Detail & Related papers (2023-09-30T15:57:14Z) - Learning the Positions in CountSketch [49.57951567374372]
We consider sketching algorithms which first compress data by multiplication with a random sketch matrix, and then apply the sketch to quickly solve an optimization problem.
In this work, we propose the first learning-based algorithms that also optimize the locations of the non-zero entries.
arXiv Detail & Related papers (2023-06-11T07:28:35Z) - Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
We propose a class of novel distributed Adam-type algorithms (emphi.e., SketchedAMSGrad) utilizing sketching.
Our new algorithm achieves a fast convergence rate of $O(frac1sqrtnT + frac1(k/d)2 T)$ with the communication cost of $O(k log(d))$ at each iteration.
arXiv Detail & Related papers (2022-10-14T01:42:05Z) - Review of Serial and Parallel Min-Cut/Max-Flow Algorithms for Computer
Vision [6.574107319036238]
Hochbaum pseudoflow algorithm is fastest serial algorithm, Boykov-Kolmogorov algorithm is most memory efficient.
Existing parallel min-cut/max-flow algorithms can significantly outperform serial algorithms on large problems but suffers from added overhead on small to medium problems.
arXiv Detail & Related papers (2022-02-01T14:06:27Z) - RAMA: A Rapid Multicut Algorithm on GPU [23.281726932718232]
We propose a highly parallel primal-dual algorithm for the multicut (a.k.a.magnitude correlation clustering) problem.
Our algorithm produces primal solutions and dual lower bounds that estimate the distance to optimum.
We can solve very large scale benchmark problems with up to $mathcalO(108)$ variables in a few seconds with small primal-dual gaps.
arXiv Detail & Related papers (2021-09-04T10:33:59Z) - Provably Faster Algorithms for Bilevel Optimization [54.83583213812667]
Bilevel optimization has been widely applied in many important machine learning applications.
We propose two new algorithms for bilevel optimization.
We show that both algorithms achieve the complexity of $mathcalO(epsilon-1.5)$, which outperforms all existing algorithms by the order of magnitude.
arXiv Detail & Related papers (2021-06-08T21:05:30Z) - Faster Mean-shift: GPU-accelerated clustering for cosine embedding-based
cell segmentation and tracking [12.60841328582138]
We propose a novel Faster Mean-shift algorithm, which tackles the computational bottleneck of embedding based cell segmentation and tracking.
The proposed Faster Mean-shift algorithm achieved 7-10 times speedup compared to the state-of-the-art embedding based cell instance segmentation and tracking algorithm.
Our Faster Mean-shift algorithm also achieved the highest computational speed compared to other GPU benchmarks with optimized memory consumption.
arXiv Detail & Related papers (2020-07-28T14:52:51Z) - Kernel methods through the roof: handling billions of points efficiently [94.31450736250918]
Kernel methods provide an elegant and principled approach to nonparametric learning, but so far could hardly be used in large scale problems.
Recent advances have shown the benefits of a number of algorithmic ideas, for example combining optimization, numerical linear algebra and random projections.
Here, we push these efforts further to develop and test a solver that takes full advantage of GPU hardware.
arXiv Detail & Related papers (2020-06-18T08:16:25Z) - Heterogeneous CPU+GPU Stochastic Gradient Descent Algorithms [1.3249453757295084]
We study training algorithms for deep learning on heterogeneous CPU+GPU architectures.
Our two-fold objective -- maximize convergence rate and resource utilization simultaneously -- makes the problem challenging.
We show that the implementation of these algorithms achieves both faster convergence and higher resource utilization than on several real datasets.
arXiv Detail & Related papers (2020-04-19T05:21:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.