Related papers: LION: Linear Group RNN for 3D Object Detection in Point Clouds

LION: Linear Group RNN for 3D Object Detection in Point Clouds

URL: http://arxiv.org/abs/2407.18232v1
Date: Thu, 25 Jul 2024 17:50:32 GMT
Title: LION: Linear Group RNN for 3D Object Detection in Point Clouds
Authors: Zhe Liu, Jinghua Hou, Xinyu Wang, Xiaoqing Ye, Jingdong Wang, Hengshuang Zhao, Xiang Bai,
Abstract summary: We propose a window-based framework built on LInear grOup RNN for accurate 3D object detection, called LION. We introduce a 3D spatial feature descriptor and integrate it into the linear group RNN operators to enhance their spatial features. To further address the challenge in highly sparse point clouds, we propose a 3D voxel generation strategy to densify foreground features.
Score: 85.97541374148508
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The benefit of transformers in large-scale 3D point cloud perception tasks, such as 3D object detection, is limited by their quadratic computation cost when modeling long-range relationships. In contrast, linear RNNs have low computational complexity and are suitable for long-range modeling. Toward this goal, we propose a simple and effective window-based framework built on LInear grOup RNN (i.e., perform linear RNN for grouped features) for accurate 3D object detection, called LION. The key property is to allow sufficient feature interaction in a much larger group than transformer-based methods. However, effectively applying linear group RNN to 3D object detection in highly sparse point clouds is not trivial due to its limitation in handling spatial modeling. To tackle this problem, we simply introduce a 3D spatial feature descriptor and integrate it into the linear group RNN operators to enhance their spatial features rather than blindly increasing the number of scanning orders for voxel features. To further address the challenge in highly sparse point clouds, we propose a 3D voxel generation strategy to densify foreground features thanks to linear group RNN as a natural property of auto-regressive models. Extensive experiments verify the effectiveness of the proposed components and the generalization of our LION on different linear group RNN operators including Mamba, RWKV, and RetNet. Furthermore, it is worth mentioning that our LION-Mamba achieves state-of-the-art on Waymo, nuScenes, Argoverse V2, and ONCE dataset. Last but not least, our method supports kinds of advanced linear RNN operators (e.g., RetNet, RWKV, Mamba, xLSTM and TTT) on small but popular KITTI dataset for a quick experience with our linear RNN-based framework.

Related papers

Enhancing Steering Estimation with Semantic-Aware GNNs [41.89219383258699]
hybrid architectures combine 3D neural network models with recurrent neural networks (RNNs) for temporal modeling. We evaluate four hybrid 3D models, all of which outperform the 2D-only baseline. We validate our approach on the KITTI dataset, achieving a 71% improvement over 2D-only models.
arXiv Detail & Related papers (2025-03-21T13:58:08Z)
Efficient 3D Recognition with Event-driven Spike Sparse Convolution [15.20476631850388]
Spiking Neural Networks (SNNs) provide an energy-efficient way to extract 3D-temporal features. We introduce the Spike Voxel Coding (SVC) scheme, which encodes the 3D point clouds into a sparse spike train space. We propose a Spike Sparse Convolution (SSC) model for efficiently extracting 3D sparse point cloud features.
arXiv Detail & Related papers (2024-12-10T09:55:15Z)
Nearest Neighbors Meet Deep Neural Networks for Point Cloud Analysis [14.844183458784235]
We present an alternative to enhance existing deep neural networks without redesigning or extra parameters, termed as Spatial-Neighbor Adapter (SN-Adapter) Building on any trained 3D network, we utilize its learned encoding capability to extract features of the training dataset and summarize them as spatial knowledge. For a test point cloud, the SN-Adapter retrieves k nearest neighbors (k-NN) from the pre-constructed spatial prototypes and linearly interpolates the k-NN prediction with prototypical that of the original 3D network.
arXiv Detail & Related papers (2023-03-01T17:57:09Z)
Pillar R-CNN for Point Cloud 3D Object Detection [4.169126928311421]
We devise a conceptually simple yet effective two-stage 3D detection architecture, named Pillar R-CNN. Our Pillar R-CNN performs favorably against state-of-the-art 3D detectors on the large-scale Open dataset. It should be highlighted that further exploration into BEV perception for applications involving autonomous driving is now possible thanks to the effective and elegant Pillar R-CNN architecture.
arXiv Detail & Related papers (2023-02-26T12:07:25Z)
Exploiting Low-Rank Tensor-Train Deep Neural Networks Based on Riemannian Gradient Descent With Illustrations of Speech Processing [74.31472195046099]
We exploit a low-rank tensor-train deep neural network (TT-DNN) to build an end-to-end deep learning pipeline, namely LR-TT-DNN. A hybrid model combining LR-TT-DNN with a convolutional neural network (CNN) is set up to boost the performance. Our empirical evidence demonstrates that the LR-TT-DNN and CNN+(LR-TT-DNN) models with fewer model parameters can outperform the TT-DNN and CNN+(LR-TT-DNN) counterparts.
arXiv Detail & Related papers (2022-03-11T15:55:34Z)
LiDAR R-CNN: An Efficient and Universal 3D Object Detector [20.17906188581305]
LiDAR-based 3D detection in point cloud is essential in the perception system of autonomous driving. We present LiDAR R-CNN, a second stage detector that can generally improve any existing 3D detector. In particular, based on one variant of PointPillars, our method could achieve new state-of-the-art results with minor cost.
arXiv Detail & Related papers (2021-03-29T03:01:21Z)
Voxel R-CNN: Towards High Performance Voxel-based 3D Object Detection [99.16162624992424]
We devise a simple but effective voxel-based framework, named Voxel R-CNN. By taking full advantage of voxel features in a two stage approach, our method achieves comparable detection accuracy with state-of-the-art point-based models. Our results show that Voxel R-CNN delivers a higher detection accuracy while maintaining a realtime frame processing rate, emphi.e, at a speed of 25 FPS on an NVIDIA 2080 Ti GPU.
arXiv Detail & Related papers (2020-12-31T17:02:46Z)
LGNN: A Context-aware Line Segment Detector [53.424521592941936]
We present a novel real-time line segment detection scheme called Line Graph Neural Network (LGNN) Our LGNN employs a deep convolutional neural network (DCNN) for proposing line segment directly, with a graph neural network (GNN) module for reasoning their connectivities. Compared with the state-of-the-art, LGNN achieves near real-time performance without compromising accuracy.
arXiv Detail & Related papers (2020-08-13T13:23:18Z)
Local Grid Rendering Networks for 3D Object Detection in Point Clouds [98.02655863113154]
CNNs are powerful but it would be computationally costly to directly apply convolutions on point data after voxelizing the entire point clouds to a dense regular 3D grid. We propose a novel and principled Local Grid Rendering (LGR) operation to render the small neighborhood of a subset of input points into a low-resolution 3D grid independently. We validate LGR-Net for 3D object detection on the challenging ScanNet and SUN RGB-D datasets.
arXiv Detail & Related papers (2020-07-04T13:57:43Z)
PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection [76.30585706811993]
We present a novel and high-performance 3D object detection framework, named PointVoxel-RCNN (PV-RCNN) Our proposed method deeply integrates both 3D voxel Convolutional Neural Network (CNN) and PointNet-based set abstraction. It takes advantages of efficient learning and high-quality proposals of the 3D voxel CNN and the flexible receptive fields of the PointNet-based networks.
arXiv Detail & Related papers (2019-12-31T06:34:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.