Faster Mean-shift: GPU-accelerated clustering for cosine embedding-based
cell segmentation and tracking
- URL: http://arxiv.org/abs/2007.14283v2
- Date: Tue, 20 Apr 2021 02:37:04 GMT
- Title: Faster Mean-shift: GPU-accelerated clustering for cosine embedding-based
cell segmentation and tracking
- Authors: Mengyang Zhao, Aadarsh Jha, Quan Liu, Bryan A. Millis, Anita
Mahadevan-Jansen, Le Lu, Bennett A. Landman, Matthew J.Tyskac and Yuankai Huo
- Abstract summary: We propose a novel Faster Mean-shift algorithm, which tackles the computational bottleneck of embedding based cell segmentation and tracking.
The proposed Faster Mean-shift algorithm achieved 7-10 times speedup compared to the state-of-the-art embedding based cell instance segmentation and tracking algorithm.
Our Faster Mean-shift algorithm also achieved the highest computational speed compared to other GPU benchmarks with optimized memory consumption.
- Score: 12.60841328582138
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, single-stage embedding based deep learning algorithms gain
increasing attention in cell segmentation and tracking. Compared with the
traditional "segment-then-associate" two-stage approach, a single-stage
algorithm not only simultaneously achieves consistent instance cell
segmentation and tracking but also gains superior performance when
distinguishing ambiguous pixels on boundaries and overlaps. However, the
deployment of an embedding based algorithm is restricted by slow inference
speed (e.g., around 1-2 mins per frame). In this study, we propose a novel
Faster Mean-shift algorithm, which tackles the computational bottleneck of
embedding based cell segmentation and tracking. Different from previous
GPU-accelerated fast mean-shift algorithms, a new online seed optimization
policy (OSOP) is introduced to adaptively determine the minimal number of
seeds, accelerate computation, and save GPU memory. With both embedding
simulation and empirical validation via the four cohorts from the ISBI cell
tracking challenge, the proposed Faster Mean-shift algorithm achieved 7-10
times speedup compared to the state-of-the-art embedding based cell instance
segmentation and tracking algorithm. Our Faster Mean-shift algorithm also
achieved the highest computational speed compared to other GPU benchmarks with
optimized memory consumption. The Faster Mean-shift is a plug-and-play model,
which can be employed on other pixel embedding based clustering inference for
medical image analysis. (Plug-and-play model is publicly available:
https://github.com/masqm/Faster-Mean-Shift)
Related papers
- Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss [59.835032408496545]
We propose a tile-based strategy that partitions the contrastive loss calculation into arbitrary small blocks.
We also introduce a multi-level tiling strategy to leverage the hierarchical structure of distributed systems.
Compared to SOTA memory-efficient solutions, it achieves a two-order-of-magnitude reduction in memory while maintaining comparable speed.
arXiv Detail & Related papers (2024-10-22T17:59:30Z) - Implementation and Analysis of GPU Algorithms for Vecchia Approximation [0.8057006406834466]
Vecchia Approximation is widely used to reduce the computational complexity and can be calculated with embarrassingly parallel algorithms.
While multi-core software has been developed for Vecchia Approximation, software designed to run on graphics processing units ( GPU) is lacking.
We show that our new method outperforms the other two and then present it in the GpGpU R package.
arXiv Detail & Related papers (2024-07-03T01:24:44Z) - Efficient stereo matching on embedded GPUs with zero-means cross
correlation [8.446808526407738]
We propose a novel acceleration approach for the zero-means normalized cross correlation (ZNCC) matching cost calculation algorithm on a Jetson Tx2 embedded GPU.
In our method for accelerating ZNCC, target images are scanned in a zigzag fashion to efficiently reuse one pixel's computation for its neighboring pixels.
Our system show real-time processing speed of 32 fps, on a Jetson Tx2 GPU for 1,280x384 pixel images with a maximum disparity of 128.
arXiv Detail & Related papers (2022-12-01T13:03:38Z) - Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
We propose a class of novel distributed Adam-type algorithms (emphi.e., SketchedAMSGrad) utilizing sketching.
Our new algorithm achieves a fast convergence rate of $O(frac1sqrtnT + frac1(k/d)2 T)$ with the communication cost of $O(k log(d))$ at each iteration.
arXiv Detail & Related papers (2022-10-14T01:42:05Z) - Rapid Person Re-Identification via Sub-space Consistency Regularization [51.76876061721556]
Person Re-Identification (ReID) matches pedestrians across disjoint cameras.
Existing ReID methods adopting real-value feature descriptors have achieved high accuracy, but they are low in efficiency due to the slow Euclidean distance computation.
We propose a novel Sub-space Consistency Regularization (SCR) algorithm that can speed up the ReID procedure by 0.25$ times.
arXiv Detail & Related papers (2022-07-13T02:44:05Z) - GPU-accelerated Faster Mean Shift with euclidean distance metrics [1.3507758562554621]
Mean-shift algorithm is widely used to solve clustering problems.
In previous research, we proposed a novel GPU-accelerated Faster Mean-shift algorithm.
In this study, we extend and improve the previous algorithm to handle Euclidean distance metrics.
arXiv Detail & Related papers (2021-12-27T20:18:24Z) - Gradient Boosted Binary Histogram Ensemble for Large-scale Regression [60.16351608335641]
We propose a gradient boosting algorithm for large-scale regression problems called textitGradient Boosted Binary Histogram Ensemble (GBBHE) based on binary histogram partition and ensemble learning.
In the experiments, compared with other state-of-the-art algorithms such as gradient boosted regression tree (GBRT), our GBBHE algorithm shows promising performance with less running time on large-scale datasets.
arXiv Detail & Related papers (2021-06-03T17:05:40Z) - Guided Interactive Video Object Segmentation Using Reliability-Based
Attention Maps [55.94785248905853]
We propose a novel guided interactive segmentation (GIS) algorithm for video objects to improve the segmentation accuracy and reduce the interaction time.
We develop the intersection-aware propagation module to propagate segmentation results to neighboring frames.
Experimental results demonstrate that the proposed algorithm provides more accurate segmentation results at a faster speed than conventional algorithms.
arXiv Detail & Related papers (2021-04-21T07:08:57Z) - MeanShift++: Extremely Fast Mode-Seeking With Applications to
Segmentation and Object Tracking [40.662116703422846]
MeanShift is a popular mode-seeking clustering algorithm used in a wide range of applications in machine learning.
We propose MeanShift++, which uses a grid-based approach to speed up the mean shift step.
The runtime is linear in the number of points and exponential in dimension, which makes MeanShift++ ideal on low-dimensional applications.
arXiv Detail & Related papers (2021-04-01T07:14:11Z) - Heterogeneous CPU+GPU Stochastic Gradient Descent Algorithms [1.3249453757295084]
We study training algorithms for deep learning on heterogeneous CPU+GPU architectures.
Our two-fold objective -- maximize convergence rate and resource utilization simultaneously -- makes the problem challenging.
We show that the implementation of these algorithms achieves both faster convergence and higher resource utilization than on several real datasets.
arXiv Detail & Related papers (2020-04-19T05:21:20Z) - Efficient Video Semantic Segmentation with Labels Propagation and
Refinement [138.55845680523908]
This paper tackles the problem of real-time semantic segmentation of high definition videos using a hybrid GPU / CPU approach.
We propose an Efficient Video(EVS) pipeline that combines: (i) On the CPU, a very fast optical flow method, that is used to exploit the temporal aspect of the video and propagate semantic information from one frame to the next.
On the popular Cityscapes dataset with high resolution frames (2048 x 1024), the proposed operating points range from 80 to 1000 Hz on a single GPU and CPU.
arXiv Detail & Related papers (2019-12-26T11:45:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.