TorchSparse: Efficient Point Cloud Inference Engine
- URL: http://arxiv.org/abs/2204.10319v1
- Date: Thu, 21 Apr 2022 17:58:30 GMT
- Title: TorchSparse: Efficient Point Cloud Inference Engine
- Authors: Haotian Tang, Zhijian Liu, Xiuyu Li, Yujun Lin, Song Han
- Abstract summary: We introduce TorchSparse, a high-performance point cloud inference engine.
TorchSparse directly optimize the two bottlenecks of sparse convolution: irregular computation and data movement.
It achieves 1.6x and 1.5x measured end-to-end speedup over the state-of-the-art MinkowskiEngine and SpConv, respectively.
- Score: 24.541195361633523
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning on point clouds has received increased attention thanks to its
wide applications in AR/VR and autonomous driving. These applications require
low latency and high accuracy to provide real-time user experience and ensure
user safety. Unlike conventional dense workloads, the sparse and irregular
nature of point clouds poses severe challenges to running sparse CNNs
efficiently on the general-purpose hardware. Furthermore, existing sparse
acceleration techniques for 2D images do not translate to 3D point clouds. In
this paper, we introduce TorchSparse, a high-performance point cloud inference
engine that accelerates the sparse convolution computation on GPUs. TorchSparse
directly optimizes the two bottlenecks of sparse convolution: irregular
computation and data movement. It applies adaptive matrix multiplication
grouping to trade computation for better regularity, achieving 1.4-1.5x speedup
for matrix multiplication. It also optimizes the data movement by adopting
vectorized, quantized and fused locality-aware memory access, reducing the
memory movement cost by 2.7x. Evaluated on seven representative models across
three benchmark datasets, TorchSparse achieves 1.6x and 1.5x measured
end-to-end speedup over the state-of-the-art MinkowskiEngine and SpConv,
respectively.
Related papers
- TorchSparse++: Efficient Training and Inference Framework for Sparse
Convolution on GPUs [20.4238781638402]
Sparse convolution plays a pivotal role in emerging workloads, including point cloud processing in AR/VR, autonomous driving, and graph understanding in recommendation systems.
Existing GPU libraries offer two dataflow types for sparse convolution.
We introduce TorchSparse++, a new GPU library that achieves the best of both worlds.
arXiv Detail & Related papers (2023-10-25T21:02:38Z) - SwiftFormer: Efficient Additive Attention for Transformer-based
Real-time Mobile Vision Applications [98.90623605283564]
We introduce a novel efficient additive attention mechanism that effectively replaces the quadratic matrix multiplication operations with linear element-wise multiplications.
We build a series of models called "SwiftFormer" which achieves state-of-the-art performance in terms of both accuracy and mobile inference speed.
Our small variant achieves 78.5% top-1 ImageNet-1K accuracy with only 0.8 ms latency on iPhone 14, which is more accurate and 2x faster compared to MobileViT-v2.
arXiv Detail & Related papers (2023-03-27T17:59:58Z) - Using a Waffle Iron for Automotive Point Cloud Semantic Segmentation [66.6890991207065]
Sparse 3D convolutions have become the de-facto tools to construct deep neural networks.
We propose an alternative method that reaches the level of state-of-the-art methods without requiring sparse convolutions.
We show that such level of performance is achievable by relying on tools a priori unfit for large scale and high-performing 3D perception.
arXiv Detail & Related papers (2023-01-24T16:10:08Z) - FlatFormer: Flattened Window Attention for Efficient Point Cloud
Transformer [30.596658616831945]
Transformer, as an alternative to CNN, has been proven effective in many modalities.
This paper presents FlatFormer to close this latency gap by trading spatial proximity for better computational regularity.
arXiv Detail & Related papers (2023-01-20T18:59:57Z) - Efficient Quantized Sparse Matrix Operations on Tensor Cores [21.963041375857117]
We propose Magicube, a high-performance sparse-matrix library for low-precision integers on cores.
We show that Magicube achieves on average 1.44x (up to 2.37x) speedup over the vendor-optimized library for sparse kernels, and 1.43x speedup over the state-of-the-art with comparable accuracy for end-to-end Transformer inference.
arXiv Detail & Related papers (2022-09-14T23:52:13Z) - BEVDetNet: Bird's Eye View LiDAR Point Cloud based Real-time 3D Object
Detection for Autonomous Driving [6.389322215324224]
We propose a novel semantic segmentation architecture as a single unified model for object center detection using key points, box predictions and orientation prediction.
The proposed architecture can be trivially extended to include semantic segmentation classes like road without any additional computation.
The model is 5X faster than other top accuracy models with a minimal accuracy degradation of 2% in Average Precision at IoU=0.5 on KITTI dataset.
arXiv Detail & Related papers (2021-04-21T22:06:39Z) - FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation [81.76975488010213]
Dense optical flow estimation plays a key role in many robotic vision tasks.
Current networks often occupy large number of parameters and require heavy computation costs.
Our proposed FastFlowNet works in the well-known coarse-to-fine manner with following innovations.
arXiv Detail & Related papers (2021-03-08T03:09:37Z) - Displacement-Invariant Cost Computation for Efficient Stereo Matching [122.94051630000934]
Deep learning methods have dominated stereo matching leaderboards by yielding unprecedented disparity accuracy.
But their inference time is typically slow, on the order of seconds for a pair of 540p images.
We propose a emphdisplacement-invariant cost module to compute the matching costs without needing a 4D feature volume.
arXiv Detail & Related papers (2020-12-01T23:58:16Z) - Multi Projection Fusion for Real-time Semantic Segmentation of 3D LiDAR
Point Clouds [2.924868086534434]
This paper introduces a novel approach for 3D point cloud semantic segmentation that exploits multiple projections of the point cloud.
Our Multi-Projection Fusion framework analyzes spherical and bird's-eye view projections using two separate highly-efficient 2D fully convolutional models.
arXiv Detail & Related papers (2020-11-03T19:40:43Z) - RT3D: Achieving Real-Time Execution of 3D Convolutional Neural Networks
on Mobile Devices [57.877112704841366]
This paper proposes RT3D, a model compression and mobile acceleration framework for 3D CNNs.
For the first time, real-time execution of 3D CNNs is achieved on off-the-shelf mobiles.
arXiv Detail & Related papers (2020-07-20T02:05:32Z) - Real-Time High-Performance Semantic Image Segmentation of Urban Street
Scenes [98.65457534223539]
We propose a real-time high-performance DCNN-based method for robust semantic segmentation of urban street scenes.
The proposed method achieves the accuracy of 73.6% and 68.0% mean Intersection over Union (mIoU) with the inference speed of 51.0 fps and 39.3 fps.
arXiv Detail & Related papers (2020-03-11T08:45:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.