Related papers: TorchSparse: Efficient Point Cloud Inference Engine

TorchSparse: Efficient Point Cloud Inference Engine

URL: http://arxiv.org/abs/2204.10319v1
Date: Thu, 21 Apr 2022 17:58:30 GMT
Title: TorchSparse: Efficient Point Cloud Inference Engine
Authors: Haotian Tang, Zhijian Liu, Xiuyu Li, Yujun Lin, Song Han
Abstract summary: We introduce TorchSparse, a high-performance point cloud inference engine. TorchSparse directly optimize the two bottlenecks of sparse convolution: irregular computation and data movement. It achieves 1.6x and 1.5x measured end-to-end speedup over the state-of-the-art MinkowskiEngine and SpConv, respectively.
Score: 24.541195361633523
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep learning on point clouds has received increased attention thanks to its wide applications in AR/VR and autonomous driving. These applications require low latency and high accuracy to provide real-time user experience and ensure user safety. Unlike conventional dense workloads, the sparse and irregular nature of point clouds poses severe challenges to running sparse CNNs efficiently on the general-purpose hardware. Furthermore, existing sparse acceleration techniques for 2D images do not translate to 3D point clouds. In this paper, we introduce TorchSparse, a high-performance point cloud inference engine that accelerates the sparse convolution computation on GPUs. TorchSparse directly optimizes the two bottlenecks of sparse convolution: irregular computation and data movement. It applies adaptive matrix multiplication grouping to trade computation for better regularity, achieving 1.4-1.5x speedup for matrix multiplication. It also optimizes the data movement by adopting vectorized, quantized and fused locality-aware memory access, reducing the memory movement cost by 2.7x. Evaluated on seven representative models across three benchmark datasets, TorchSparse achieves 1.6x and 1.5x measured end-to-end speedup over the state-of-the-art MinkowskiEngine and SpConv, respectively.

Related papers

Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation [57.56385490252605]
Diffusion Transformers (DiTs) are essential for video generation but suffer from significant latency due to the quadratic complexity of attention.<n>We propose SVG2, a training-free framework that maximizes identification accuracy and computation minimizes waste.
arXiv Detail & Related papers (2025-05-24T21:30:29Z)
Speedy MASt3R [68.47052557089631]
MASt3R redefines image matching as a 3D task by leveraging DUSt3R and introducing a fast reciprocal matching scheme. Fast MASt3R achieves a 54% reduction in inference time (198 ms to 91 ms per image pair) without sacrificing accuracy. This advancement enables real-time 3D understanding, benefiting applications like mixed reality navigation and large-scale 3D scene reconstruction.
arXiv Detail & Related papers (2025-03-13T03:56:22Z)
TorchSparse++: Efficient Training and Inference Framework for Sparse Convolution on GPUs [20.4238781638402]
Sparse convolution plays a pivotal role in emerging workloads, including point cloud processing in AR/VR, autonomous driving, and graph understanding in recommendation systems. Existing GPU libraries offer two dataflow types for sparse convolution. We introduce TorchSparse++, a new GPU library that achieves the best of both worlds.
arXiv Detail & Related papers (2023-10-25T21:02:38Z)
SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications [98.90623605283564]
We introduce a novel efficient additive attention mechanism that effectively replaces the quadratic matrix multiplication operations with linear element-wise multiplications. We build a series of models called "SwiftFormer" which achieves state-of-the-art performance in terms of both accuracy and mobile inference speed. Our small variant achieves 78.5% top-1 ImageNet-1K accuracy with only 0.8 ms latency on iPhone 14, which is more accurate and 2x faster compared to MobileViT-v2.
arXiv Detail & Related papers (2023-03-27T17:59:58Z)
Using a Waffle Iron for Automotive Point Cloud Semantic Segmentation [66.6890991207065]
Sparse 3D convolutions have become the de-facto tools to construct deep neural networks. We propose an alternative method that reaches the level of state-of-the-art methods without requiring sparse convolutions. We show that such level of performance is achievable by relying on tools a priori unfit for large scale and high-performing 3D perception.
arXiv Detail & Related papers (2023-01-24T16:10:08Z)
FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer [30.596658616831945]
Transformer, as an alternative to CNN, has been proven effective in many modalities. This paper presents FlatFormer to close this latency gap by trading spatial proximity for better computational regularity.
arXiv Detail & Related papers (2023-01-20T18:59:57Z)
Efficient Quantized Sparse Matrix Operations on Tensor Cores [21.963041375857117]
We propose Magicube, a high-performance sparse-matrix library for low-precision integers on cores. We show that Magicube achieves on average 1.44x (up to 2.37x) speedup over the vendor-optimized library for sparse kernels, and 1.43x speedup over the state-of-the-art with comparable accuracy for end-to-end Transformer inference.
arXiv Detail & Related papers (2022-09-14T23:52:13Z)
BEVDetNet: Bird's Eye View LiDAR Point Cloud based Real-time 3D Object Detection for Autonomous Driving [6.389322215324224]
We propose a novel semantic segmentation architecture as a single unified model for object center detection using key points, box predictions and orientation prediction. The proposed architecture can be trivially extended to include semantic segmentation classes like road without any additional computation. The model is 5X faster than other top accuracy models with a minimal accuracy degradation of 2% in Average Precision at IoU=0.5 on KITTI dataset.
arXiv Detail & Related papers (2021-04-21T22:06:39Z)
FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation [81.76975488010213]
Dense optical flow estimation plays a key role in many robotic vision tasks. Current networks often occupy large number of parameters and require heavy computation costs. Our proposed FastFlowNet works in the well-known coarse-to-fine manner with following innovations.
arXiv Detail & Related papers (2021-03-08T03:09:37Z)
Displacement-Invariant Cost Computation for Efficient Stereo Matching [122.94051630000934]
Deep learning methods have dominated stereo matching leaderboards by yielding unprecedented disparity accuracy. But their inference time is typically slow, on the order of seconds for a pair of 540p images. We propose a emphdisplacement-invariant cost module to compute the matching costs without needing a 4D feature volume.
arXiv Detail & Related papers (2020-12-01T23:58:16Z)
Multi Projection Fusion for Real-time Semantic Segmentation of 3D LiDAR Point Clouds [2.924868086534434]
This paper introduces a novel approach for 3D point cloud semantic segmentation that exploits multiple projections of the point cloud. Our Multi-Projection Fusion framework analyzes spherical and bird's-eye view projections using two separate highly-efficient 2D fully convolutional models.
arXiv Detail & Related papers (2020-11-03T19:40:43Z)
RT3D: Achieving Real-Time Execution of 3D Convolutional Neural Networks on Mobile Devices [57.877112704841366]
This paper proposes RT3D, a model compression and mobile acceleration framework for 3D CNNs. For the first time, real-time execution of 3D CNNs is achieved on off-the-shelf mobiles.
arXiv Detail & Related papers (2020-07-20T02:05:32Z)
Real-Time High-Performance Semantic Image Segmentation of Urban Street Scenes [98.65457534223539]
We propose a real-time high-performance DCNN-based method for robust semantic segmentation of urban street scenes. The proposed method achieves the accuracy of 73.6% and 68.0% mean Intersection over Union (mIoU) with the inference speed of 51.0 fps and 39.3 fps.
arXiv Detail & Related papers (2020-03-11T08:45:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.