Efficient 3D Recognition with Event-driven Spike Sparse Convolution
- URL: http://arxiv.org/abs/2412.07360v2
- Date: Tue, 04 Feb 2025 02:52:37 GMT
- Title: Efficient 3D Recognition with Event-driven Spike Sparse Convolution
- Authors: Xuerui Qiu, Man Yao, Jieyuan Zhang, Yuhong Chou, Ning Qiao, Shibo Zhou, Bo Xu, Guoqi Li,
- Abstract summary: Spiking Neural Networks (SNNs) provide an energy-efficient way to extract 3D-temporal features.
We introduce the Spike Voxel Coding (SVC) scheme, which encodes the 3D point clouds into a sparse spike train space.
We propose a Spike Sparse Convolution (SSC) model for efficiently extracting 3D sparse point cloud features.
- Score: 15.20476631850388
- License:
- Abstract: Spiking Neural Networks (SNNs) provide an energy-efficient way to extract 3D spatio-temporal features. Point clouds are sparse 3D spatial data, which suggests that SNNs should be well-suited for processing them. However, when applying SNNs to point clouds, they often exhibit limited performance and fewer application scenarios. We attribute this to inappropriate preprocessing and feature extraction methods. To address this issue, we first introduce the Spike Voxel Coding (SVC) scheme, which encodes the 3D point clouds into a sparse spike train space, reducing the storage requirements and saving time on point cloud preprocessing. Then, we propose a Spike Sparse Convolution (SSC) model for efficiently extracting 3D sparse point cloud features. Combining SVC and SSC, we design an efficient 3D SNN backbone (E-3DSNN), which is friendly with neuromorphic hardware. For instance, SSC can be implemented on neuromorphic chips with only minor modifications to the addressing function of vanilla spike convolution. Experiments on ModelNet40, KITTI, and Semantic KITTI datasets demonstrate that E-3DSNN achieves state-of-the-art (SOTA) results with remarkable efficiency. Notably, our E-3DSNN (1.87M) obtained 91.7\% top-1 accuracy on ModelNet40, surpassing the current best SNN baselines (14.3M) by 3.0\%. To our best knowledge, it is the first direct training 3D SNN backbone that can simultaneously handle various 3D computer vision tasks (e.g., classification, detection, and segmentation) with an event-driven nature. Code is available: https://github.com/bollossom/E-3DSNN/.
Related papers
- Scaling Spike-driven Transformer with Efficient Spike Firing Approximation Training [17.193023656793464]
The ambition of brain-inspired Spiking Neural Networks (SNNs) is to become a low-power alternative to traditional Artificial Neural Networks (ANNs)
This work addresses two major challenges in realizing this vision: the performance gap between SNNs and ANNs, and the high training costs of SNNs.
We identify intrinsic flaws in spiking neurons caused by binary firing mechanisms and propose a Spike Firing Approximation (SFA) method using integer training and spike-driven inference.
arXiv Detail & Related papers (2024-11-25T03:05:41Z) - LION: Linear Group RNN for 3D Object Detection in Point Clouds [85.97541374148508]
We propose a window-based framework built on LInear grOup RNN for accurate 3D object detection, called LION.
We introduce a 3D spatial feature descriptor and integrate it into the linear group RNN operators to enhance their spatial features.
To further address the challenge in highly sparse point clouds, we propose a 3D voxel generation strategy to densify foreground features.
arXiv Detail & Related papers (2024-07-25T17:50:32Z) - ANN vs SNN: A case study for Neural Decoding in Implantable
Brain-Machine Interfaces [0.7904805552920349]
In this work, we compare different neural networks (NN) for motor decoding in terms of accuracy and implementation cost.
We further show that combining traditional signal processing techniques with machine learning ones deliver surprisingly good performance even with simple NNs.
arXiv Detail & Related papers (2023-12-26T05:40:39Z) - Swin3D: A Pretrained Transformer Backbone for 3D Indoor Scene
Understanding [40.68012530554327]
We introduce a pretrained 3D backbone, called SST, for 3D indoor scene understanding.
We design a 3D Swin transformer as our backbone network, which enables efficient self-attention on sparse voxels with linear memory complexity.
A series of extensive ablation studies further validate the scalability, generality, and superior performance enabled by our approach.
arXiv Detail & Related papers (2023-04-14T02:49:08Z) - MLGCN: An Ultra Efficient Graph Convolution Neural Model For 3D Point
Cloud Analysis [4.947552172739438]
We introduce a novel Multi-level Graph Convolution Neural (MLGCN) model, which uses Graph Neural Networks (GNN) blocks to extract features from 3D point clouds at specific locality levels.
Our approach produces comparable results to those of state-of-the-art models while requiring up to a thousand times fewer floating-point operations (FLOPs) and having significantly reduced storage requirements.
arXiv Detail & Related papers (2023-03-31T00:15:22Z) - Using a Waffle Iron for Automotive Point Cloud Semantic Segmentation [66.6890991207065]
Sparse 3D convolutions have become the de-facto tools to construct deep neural networks.
We propose an alternative method that reaches the level of state-of-the-art methods without requiring sparse convolutions.
We show that such level of performance is achievable by relying on tools a priori unfit for large scale and high-performing 3D perception.
arXiv Detail & Related papers (2023-01-24T16:10:08Z) - Training High-Performance Low-Latency Spiking Neural Networks by
Differentiation on Spike Representation [70.75043144299168]
Spiking Neural Network (SNN) is a promising energy-efficient AI model when implemented on neuromorphic hardware.
It is a challenge to efficiently train SNNs due to their non-differentiability.
We propose the Differentiation on Spike Representation (DSR) method, which could achieve high performance.
arXiv Detail & Related papers (2022-05-01T12:44:49Z) - Focal Sparse Convolutional Networks for 3D Object Detection [121.45950754511021]
We introduce two new modules to enhance the capability of Sparse CNNs.
They are focal sparse convolution (Focals Conv) and its multi-modal variant of focal sparse convolution with fusion.
For the first time, we show that spatially learnable sparsity in sparse convolution is essential for sophisticated 3D object detection.
arXiv Detail & Related papers (2022-04-26T17:34:10Z) - 3D CNNs with Adaptive Temporal Feature Resolutions [83.43776851586351]
Similarity Guided Sampling (SGS) module can be plugged into any existing 3D CNN architecture.
SGS empowers 3D CNNs by learning the similarity of temporal features and grouping similar features together.
Our evaluations show that the proposed module improves the state-of-the-art by reducing the computational cost (GFLOPs) by half while preserving or even improving the accuracy.
arXiv Detail & Related papers (2020-11-17T14:34:05Z) - PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection [76.30585706811993]
We present a novel and high-performance 3D object detection framework, named PointVoxel-RCNN (PV-RCNN)
Our proposed method deeply integrates both 3D voxel Convolutional Neural Network (CNN) and PointNet-based set abstraction.
It takes advantages of efficient learning and high-quality proposals of the 3D voxel CNN and the flexible receptive fields of the PointNet-based networks.
arXiv Detail & Related papers (2019-12-31T06:34:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.