Related papers: Accelerating Sparse Convolutions in Voxel-Based Point Cloud Networks

Accelerating Sparse Convolutions in Voxel-Based Point Cloud Networks

URL: http://arxiv.org/abs/2511.20834v1
Date: Tue, 25 Nov 2025 20:34:37 GMT
Title: Accelerating Sparse Convolutions in Voxel-Based Point Cloud Networks
Authors: Dionysios Adamopoulos, Anastasia Poulopoulou, Georgios Goumas, Christina Giannoula,
Abstract summary: Sparse Convolution powers 3D point cloud networks widely used in autonomous driving and AR/VR.<n>SpC builds a kernel map that stores mappings between input voxel coordinates, output coordinates, and weight offsets, then uses this map to compute feature vectors for output coordinates.<n>Our work identifies three key properties of voxel coordinates: they are integer-valued, bounded within a limited spatial range, and geometrically continuous-neighboring voxels on the same object surface are highly likely to exist at small spatial offsets from each other.<n>We design Spira, the first voxel-property-aware
Score: 0.34304285205574886
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Sparse Convolution (SpC) powers 3D point cloud networks widely used in autonomous driving and AR/VR. SpC builds a kernel map that stores mappings between input voxel coordinates, output coordinates, and weight offsets, then uses this map to compute feature vectors for output coordinates. Our work identifies three key properties of voxel coordinates: they are integer-valued, bounded within a limited spatial range, and geometrically continuous-neighboring voxels on the same object surface are highly likely to exist at small spatial offsets from each other. Prior SpC engines do not fully exploit these properties and suffer from high pre-processing and post-processing overheads during kernel map construction. To address this, we design Spira, the first voxel-property-aware SpC engine for GPUs. Spira proposes: (i) a high-performance one-shot search algorithm that builds the kernel map with no preprocessing and high memory locality, (ii) an effective packed-native processing scheme that accesses packed voxel coordinates at low cost, (iii) a flexible dual-dataflow execution mechanism that efficiently computes output feature vectors by adapting to layer characteristics, and (iv) a network-wide parallelization strategy that builds kernel maps for all SpC layers concurrently at network start. Our evaluation shows that Spira significantly outperforms prior SpC engines by 1.71x on average and up to 2.31x for end-to-end inference, and by 2.13x on average and up to 3.32x for layer-wise execution across diverse layer configurations.

Related papers

HARP-NeXt: High-Speed and Accurate Range-Point Fusion Network for 3D LiDAR Semantic Segmentation [39.58684038370709]
LiDAR semantic segmentation is crucial for autonomous vehicles and mobile robots.<n>Previous state-of-the-art methods often face a trade-off between accuracy and speed.<n>We introduce HARP-NeXt, a high-speed and accurate LiDAR semantic segmentation network.
arXiv Detail & Related papers (2025-10-08T10:46:07Z)
Ev-Edge: Efficient Execution of Event-based Vision Algorithms on Commodity Edge Platforms [10.104371980353973]
Ev-Edge is a framework that contains three key optimizations to boost the performance of event-based vision systems on edge platforms. On several state-of-art networks for a range of autonomous navigation tasks, Ev-Edge achieves 1.28x-2.05x improvements in latency and 1.23x-2.15x in energy.
arXiv Detail & Related papers (2024-03-23T04:44:55Z)
Minuet: Accelerating 3D Sparse Convolutions on GPUs [9.54287796030519]
Sparse Convolution (SC) is widely used for processing 3D point clouds that are inherently sparse. In this work, we analyze the shortcomings of prior state-of-the-art SC engines, and propose Minuet, a novel memory-efficient SC engine tailored for modern GPUs. Our evaluations show that Minuet significantly outperforms prior SC engines by on average $1.74times$ (up to $2.22times$) for end-to-end point cloud network executions.
arXiv Detail & Related papers (2023-12-01T05:09:02Z)
PointOcc: Cylindrical Tri-Perspective View for Point-based 3D Semantic Occupancy Prediction [72.75478398447396]
We propose a cylindrical tri-perspective view to represent point clouds effectively and comprehensively. Considering the distance distribution of LiDAR point clouds, we construct the tri-perspective view in the cylindrical coordinate system. We employ spatial group pooling to maintain structural details during projection and adopt 2D backbones to efficiently process each TPV plane.
arXiv Detail & Related papers (2023-08-31T17:57:17Z)
Voxel or Pillar: Exploring Efficient Point Cloud Representation for 3D Object Detection [49.324070632356296]
We develop a sparse voxel-pillar encoder that encodes point clouds into voxel and pillar features through 3D and 2D sparse convolutions respectively. Our efficient, fully sparse method can be seamlessly integrated into both dense and sparse detectors.
arXiv Detail & Related papers (2023-04-06T05:00:58Z)
DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets [95.84755169585492]
We present Dynamic Sparse Voxel Transformer (DSVT), a single-stride window-based voxel Transformer backbone for outdoor 3D perception. Our model achieves state-of-the-art performance with a broad range of 3D perception tasks.
arXiv Detail & Related papers (2023-01-15T09:31:58Z)
CAGroup3D: Class-Aware Grouping for 3D Object Detection on Point Clouds [55.44204039410225]
We present a novel two-stage fully sparse convolutional 3D object detection framework, named CAGroup3D. Our proposed method first generates some high-quality 3D proposals by leveraging the class-aware local group strategy on the object surface voxels. To recover the features of missed voxels due to incorrect voxel-wise segmentation, we build a fully sparse convolutional RoI pooling module.
arXiv Detail & Related papers (2022-10-09T13:38:48Z)
Composite Convolution: a Flexible Operator for Deep Learning on 3D Point Clouds [4.104847990024176]
We introduce the composite layer, a flexible and general alternative to the existing convolutional operators that process 3D point clouds. Compared to mainstream point-convolutional layers such as ConvPoint and KPConv, our composite layer guarantees greater flexibility in network design and provides an additional form of regularization. Our experiments on synthetic and real-world datasets show that, in both classification, segmentation, and anomaly detection, our CompositeNets outperform ConvPoint, which uses the same sequential architecture, and achieve similar results as KPConv, which has a deeper, residual architecture.
arXiv Detail & Related papers (2022-09-23T18:32:18Z)
PV-RCNN++: Point-Voxel Feature Set Abstraction With Local Vector Representation for 3D Object Detection [100.60209139039472]
We propose the PointVoxel Region based Convolution Neural Networks (PVRCNNs) for accurate 3D detection from point clouds. Our proposed PV-RCNNs significantly outperform previous state-of-the-art 3D detection methods on both the Open dataset and the highly-competitive KITTI benchmark.
arXiv Detail & Related papers (2021-01-31T14:51:49Z)
Voxel R-CNN: Towards High Performance Voxel-based 3D Object Detection [99.16162624992424]
We devise a simple but effective voxel-based framework, named Voxel R-CNN. By taking full advantage of voxel features in a two stage approach, our method achieves comparable detection accuracy with state-of-the-art point-based models. Our results show that Voxel R-CNN delivers a higher detection accuracy while maintaining a realtime frame processing rate, emphi.e, at a speed of 25 FPS on an NVIDIA 2080 Ti GPU.
arXiv Detail & Related papers (2020-12-31T17:02:46Z)
Real-Time High-Performance Semantic Image Segmentation of Urban Street Scenes [98.65457534223539]
We propose a real-time high-performance DCNN-based method for robust semantic segmentation of urban street scenes. The proposed method achieves the accuracy of 73.6% and 68.0% mean Intersection over Union (mIoU) with the inference speed of 51.0 fps and 39.3 fps.
arXiv Detail & Related papers (2020-03-11T08:45:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.