GPU optimization of the 3D Scale-invariant Feature Transform Algorithm
and a Novel BRIEF-inspired 3D Fast Descriptor
- URL: http://arxiv.org/abs/2112.10258v1
- Date: Sun, 19 Dec 2021 20:56:40 GMT
- Title: GPU optimization of the 3D Scale-invariant Feature Transform Algorithm
and a Novel BRIEF-inspired 3D Fast Descriptor
- Authors: Jean-Baptiste Carluer, Laurent Chauvin, Jie Luo, William M. Wells III,
Ines Machado, Rola Harmouche, Matthew Toews
- Abstract summary: This work details a highly efficient implementation of the 3D scale-invariant feature transform (SIFT) algorithm, for the purpose of machine learning from large sets of medical image data.
The primary operations of the 3D SIFT code are implemented on a graphics processing unit (GPU), including convolution, sub-sampling, and 4D peak detection from scale-space pyramids.
The performance improvements are quantified in keypoint detection and image-to-image matching experiments, using 3D MRI human brain volumes of different people.
- Score: 5.1537294207900715
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work details a highly efficient implementation of the 3D scale-invariant
feature transform (SIFT) algorithm, for the purpose of machine learning from
large sets of volumetric medical image data. The primary operations of the 3D
SIFT code are implemented on a graphics processing unit (GPU), including
convolution, sub-sampling, and 4D peak detection from scale-space pyramids. The
performance improvements are quantified in keypoint detection and
image-to-image matching experiments, using 3D MRI human brain volumes of
different people. Computationally efficient 3D keypoint descriptors are
proposed based on the Binary Robust Independent Elementary Feature (BRIEF)
code, including a novel descriptor we call Ranked Robust Independent Elementary
Features (RRIEF), and compared to the original 3D SIFT-Rank
method\citep{toews2013efficient}. The GPU implementation affords a speedup of
approximately 7X beyond an optimised CPU implementation, where computation time
is reduced from 1.4 seconds to 0.2 seconds for 3D volumes of size (145, 174,
145) voxels with approximately 3000 keypoints. Notable speedups include the
convolution operation (20X), 4D peak detection (3X), sub-sampling (3X), and
difference-of-Gaussian pyramid construction (2X). Efficient descriptors offer a
speedup of 2X and a memory savings of 6X compared to standard SIFT-Rank
descriptors, at a cost of reduced numbers of keypoint correspondences,
revealing a trade-off between computational efficiency and algorithmic
performance. The speedups gained by our implementation will allow for a more
efficient analysis on larger data sets. Our optimized GPU implementation of the
3D SIFT-Rank extractor is available at
https://github.com/CarluerJB/3D_SIFT_CUDA.
Related papers
- Efficient and Distributed Large-Scale 3D Map Registration using Tomographic Features [10.740403545402508]
A robust, resource-efficient, distributed, and minimally parameterized 3D map matching and merging algorithm is proposed.
The suggested algorithm utilizes tomographic features from 2D projections of horizontal cross-sections of gravity-aligned local maps, and matches these projection slices at all possible height differences.
arXiv Detail & Related papers (2024-06-27T18:03:06Z) - MicroDreamer: Efficient 3D Generation in $\sim$20 Seconds by Score-based Iterative Reconstruction [37.07128043394227]
This paper introduces score-based iterative reconstruction (SIR), an efficient and general algorithm mimicking a differentiable 3D reconstruction process to reduce the NFEs.
We present an efficient approach called MicroDreamer that generally applies to various 3D representations and 3D generation tasks.
arXiv Detail & Related papers (2024-04-30T12:56:14Z) - Identifying Unnecessary 3D Gaussians using Clustering for Fast Rendering of 3D Gaussian Splatting [2.878831747437321]
3D-GS is a new rendering approach that outperforms the neural radiance field (NeRF) in terms of both speed and image quality.
We propose a computational reduction technique that quickly identifies unnecessary 3D Gaussians in real-time for rendering the current view.
For the Mip-NeRF360 dataset, the proposed technique excludes 63% of 3D Gaussians on average before the 2D image projection, which reduces the overall rendering by almost 38.3% without sacrificing peak-signal-to-noise-ratio (PSNR)
The proposed accelerator also achieves a speedup of 10.7x compared to a GPU
arXiv Detail & Related papers (2024-02-21T14:16:49Z) - Splatter Image: Ultra-Fast Single-View 3D Reconstruction [67.96212093828179]
Splatter Image is based on Gaussian Splatting, which allows fast and high-quality reconstruction of 3D scenes from multiple images.
We learn a neural network that, at test time, performs reconstruction in a feed-forward manner, at 38 FPS.
On several synthetic, real, multi-category and large-scale benchmark datasets, we achieve better results in terms of PSNR, LPIPS, and other metrics while training and evaluating much faster than prior works.
arXiv Detail & Related papers (2023-12-20T16:14:58Z) - Instant3D: Instant Text-to-3D Generation [101.25562463919795]
We propose a novel framework for fast text-to-3D generation, dubbed Instant3D.
Instant3D is able to create a 3D object for an unseen text prompt in less than one second with a single run of a feedforward network.
arXiv Detail & Related papers (2023-11-14T18:59:59Z) - INR-Arch: A Dataflow Architecture and Compiler for Arbitrary-Order
Gradient Computations in Implicit Neural Representation Processing [66.00729477511219]
Given a function represented as a computation graph, traditional architectures face challenges in efficiently computing its nth-order gradient.
We introduce INR-Arch, a framework that transforms the computation graph of an nth-order gradient into a hardware-optimized dataflow architecture.
We present results that demonstrate 1.8-4.8x and 1.5-3.6x speedup compared to CPU and GPU baselines respectively.
arXiv Detail & Related papers (2023-08-11T04:24:39Z) - Spatiotemporal Modeling Encounters 3D Medical Image Analysis:
Slice-Shift UNet with Multi-View Fusion [0.0]
We propose a new 2D-based model dubbed Slice SHift UNet which encodes three-dimensional features at 2D CNN's complexity.
More precisely multi-view features are collaboratively learned by performing 2D convolutions along the three planes of a volume.
The effectiveness of our approach is validated in Multi-Modality Abdominal Multi-Organ axis (AMOS) and Multi-Atlas Labeling Beyond the Cranial Vault (BTCV) datasets.
arXiv Detail & Related papers (2023-07-24T14:53:23Z) - UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation [93.88170217725805]
We propose a 3D medical image segmentation approach, named UNETR++, that offers both high-quality segmentation masks as well as efficiency in terms of parameters, compute cost, and inference speed.
The core of our design is the introduction of a novel efficient paired attention (EPA) block that efficiently learns spatial and channel-wise discriminative features.
Our evaluations on five benchmarks, Synapse, BTCV, ACDC, BRaTs, and Decathlon-Lung, reveal the effectiveness of our contributions in terms of both efficiency and accuracy.
arXiv Detail & Related papers (2022-12-08T18:59:57Z) - Fast-SNARF: A Fast Deformer for Articulated Neural Fields [92.68788512596254]
We propose a new articulation module for neural fields, Fast-SNARF, which finds accurate correspondences between canonical space and posed space.
Fast-SNARF is a drop-in replacement in to our previous work, SNARF, while significantly improving its computational efficiency.
Because learning of deformation maps is a crucial component in many 3D human avatar methods, we believe that this work represents a significant step towards the practical creation of 3D virtual humans.
arXiv Detail & Related papers (2022-11-28T17:55:34Z) - Displacement-Invariant Cost Computation for Efficient Stereo Matching [122.94051630000934]
Deep learning methods have dominated stereo matching leaderboards by yielding unprecedented disparity accuracy.
But their inference time is typically slow, on the order of seconds for a pair of 540p images.
We propose a emphdisplacement-invariant cost module to compute the matching costs without needing a 4D feature volume.
arXiv Detail & Related papers (2020-12-01T23:58:16Z) - Depthwise Spatio-Temporal STFT Convolutional Neural Networks for Human
Action Recognition [42.400429835080416]
Conventional 3D convolutional neural networks (CNNs) are computationally expensive, memory intensive, prone to overfitting and most importantly, there is a need to improve their feature learning capabilities.
We propose new class of convolutional blocks that can serve as an alternative to 3D convolutional layer and its variants in 3D CNNs.
Our evaluation on seven action recognition datasets, including Something-something v1 and v2, Jester, Diving Kinetics-400, UCF 101, and HMDB 51, demonstrate that STFT blocks based 3D CNNs achieve on par or even better performance compared to the state-of
arXiv Detail & Related papers (2020-07-22T12:26:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.