Related papers: Parallel 3DPIFCM Algorithm for Noisy Brain MRI Images

Parallel 3DPIFCM Algorithm for Noisy Brain MRI Images

URL: http://arxiv.org/abs/2002.01981v1
Date: Wed, 5 Feb 2020 20:30:29 GMT
Title: Parallel 3DPIFCM Algorithm for Noisy Brain MRI Images
Authors: Arie Agranonik, Maya Herman, Mark Last
Abstract summary: In this paper we implement the algorithm we developed in [1] called 3DPIFCM in a parallel environment by using on a GPU. Our results show that the parallel version of the algorithm performs up to 27x faster than the original sequential version and 68x faster than GAIFCM algorithm.
Score: 3.3946853660795884
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper we implemented the algorithm we developed in [1] called 3DPIFCM in a parallel environment by using CUDA on a GPU. In our previous work we introduced 3DPIFCM which performs segmentation of images in noisy conditions and uses particle swarm optimization for finding the optimal algorithm parameters to account for noise. This algorithm achieved state of the art segmentation accuracy when compared to FCM (Fuzzy C-Means), IFCMPSO (Improved Fuzzy C-Means with Particle Swarm Optimization), GAIFCM (Genetic Algorithm Improved Fuzzy C-Means) on noisy MRI images of an adult Brain. When using a genetic algorithm or PSO (Particle Swarm Optimization) on a single machine for optimization we witnessed long execution times for practical clinical usage. Therefore, in the current paper our goal was to speed up the execution of 3DPIFCM by taking out parts of the algorithm and executing them as kernels on a GPU. The algorithm was implemented using the CUDA [13] framework from NVIDIA and experiments where performed on a server containing 64GB RAM , 8 cores and a TITAN X GPU with 3072 SP cores and 12GB of GPU memory. Our results show that the parallel version of the algorithm performs up to 27x faster than the original sequential version and 68x faster than GAIFCM algorithm. We show that the speedup of the parallel version increases as we increase the size of the image due to better utilization of cores in the GPU. Also, we show a speedup of up to 5x in our Brainweb experiment compared to other generic variants such as IFCMPSO and GAIFCM.

Related papers

Speedy MASt3R [68.47052557089631]
MASt3R redefines image matching as a 3D task by leveraging DUSt3R and introducing a fast reciprocal matching scheme. Fast MASt3R achieves a 54% reduction in inference time (198 ms to 91 ms per image pair) without sacrificing accuracy. This advancement enables real-time 3D understanding, benefiting applications like mixed reality navigation and large-scale 3D scene reconstruction.
arXiv Detail & Related papers (2025-03-13T03:56:22Z)
A GPU Implementation of Multi-Guiding Spark Fireworks Algorithm for Efficient Black-Box Neural Network Optimization [2.9608128305931825]
This paper presents a GPU-accelerated version of the Multi-Guiding Spark Fireworks Algorithm (MGFWA) We demonstrate its superior performance in terms of both speed and solution quality. The proposed implementation offers a promising approach to accelerate swarm intelligence algorithms.
arXiv Detail & Related papers (2025-01-07T17:09:07Z)
3DGS-LM: Faster Gaussian-Splatting Optimization with Levenberg-Marquardt [65.25603275491544]
We present 3DGS-LM, a new method that accelerates the reconstruction of 3D Gaussian Splatting (3DGS) Our method is 30% faster than the original 3DGS while obtaining the same reconstruction quality optimization.
arXiv Detail & Related papers (2024-09-19T16:31:44Z)
INR-Arch: A Dataflow Architecture and Compiler for Arbitrary-Order Gradient Computations in Implicit Neural Representation Processing [66.00729477511219]
Given a function represented as a computation graph, traditional architectures face challenges in efficiently computing its nth-order gradient. We introduce INR-Arch, a framework that transforms the computation graph of an nth-order gradient into a hardware-optimized dataflow architecture. We present results that demonstrate 1.8-4.8x and 1.5-3.6x speedup compared to CPU and GPU baselines respectively.
arXiv Detail & Related papers (2023-08-11T04:24:39Z)
Batch-efficient EigenDecomposition for Small and Medium Matrices [65.67315418971688]
EigenDecomposition (ED) is at the heart of many computer vision algorithms and applications. We propose a QR-based ED method dedicated to the application scenarios of computer vision.
arXiv Detail & Related papers (2022-07-09T09:14:12Z)
Fast and High-Quality Image Denoising via Malleable Convolutions [72.18723834537494]
We present Malleable Convolution (MalleConv), as an efficient variant of dynamic convolution. Unlike previous works, MalleConv generates a much smaller set of spatially-varying kernels from input. We also build an efficient denoising network using MalleConv, coined as MalleNet.
arXiv Detail & Related papers (2022-01-02T18:35:20Z)
GPU-accelerated Faster Mean Shift with euclidean distance metrics [1.3507758562554621]
Mean-shift algorithm is widely used to solve clustering problems. In previous research, we proposed a novel GPU-accelerated Faster Mean-shift algorithm. In this study, we extend and improve the previous algorithm to handle Euclidean distance metrics.
arXiv Detail & Related papers (2021-12-27T20:18:24Z)
GPU optimization of the 3D Scale-invariant Feature Transform Algorithm and a Novel BRIEF-inspired 3D Fast Descriptor [5.1537294207900715]
This work details a highly efficient implementation of the 3D scale-invariant feature transform (SIFT) algorithm, for the purpose of machine learning from large sets of medical image data. The primary operations of the 3D SIFT code are implemented on a graphics processing unit (GPU), including convolution, sub-sampling, and 4D peak detection from scale-space pyramids. The performance improvements are quantified in keypoint detection and image-to-image matching experiments, using 3D MRI human brain volumes of different people.
arXiv Detail & Related papers (2021-12-19T20:56:40Z)
RAMA: A Rapid Multicut Algorithm on GPU [23.281726932718232]
We propose a highly parallel primal-dual algorithm for the multicut (a.k.a.magnitude correlation clustering) problem. Our algorithm produces primal solutions and dual lower bounds that estimate the distance to optimum. We can solve very large scale benchmark problems with up to $mathcalO(108)$ variables in a few seconds with small primal-dual gaps.
arXiv Detail & Related papers (2021-09-04T10:33:59Z)
Efficient and Generic 1D Dilated Convolution Layer for Deep Learning [52.899995651639436]
We introduce our efficient implementation of a generic 1D convolution layer covering a wide range of parameters. It is optimized for x86 CPU architectures, in particular, for architectures containing Intel AVX-512 and AVX-512 BFloat16 instructions. We demonstrate the performance of our optimized 1D convolution layer by utilizing it in the end-to-end neural network training with real genomics datasets.
arXiv Detail & Related papers (2021-04-16T09:54:30Z)
Increased performance in DDM analysis by calculating structure functions through Fourier transform in time [0.0]
We present an algorithm to efficiently process a set of images according to the Differential Dynamic Microscopy analysis scheme. The new implementation computes the DDM analysis faster, thanks to an additional Fourier transform in time instead of performing differences of signals. Without GPU hardware acceleration and for the same set of images, we found that the new algorithm is 300 faster than the old one both running only on the CPU.
arXiv Detail & Related papers (2020-12-02T21:12:45Z)
Kernel methods through the roof: handling billions of points efficiently [94.31450736250918]
Kernel methods provide an elegant and principled approach to nonparametric learning, but so far could hardly be used in large scale problems. Recent advances have shown the benefits of a number of algorithmic ideas, for example combining optimization, numerical linear algebra and random projections. Here, we push these efforts further to develop and test a solver that takes full advantage of GPU hardware.
arXiv Detail & Related papers (2020-06-18T08:16:25Z)
3DPIFCM Novel Algorithm for Segmentation of Noisy Brain MRI Images [3.3946853660795884]
3DPIFCM is an extension of a well-known IFCM (Improved Fuzzy C-Means) algorithm. It performs fuzzy segmentation and introduces a fitness function that is affected by proximity of the voxels. The 3DPIFCM algorithm uses PSO (Particle Swarm Optimization) in order to optimize the fitness function.
arXiv Detail & Related papers (2020-02-05T20:48:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.