Related papers: FALO: Fast and Accurate LiDAR 3D Object Detection on Resource-Constrained Devices

FALO: Fast and Accurate LiDAR 3D Object Detection on Resource-Constrained Devices

URL: http://arxiv.org/abs/2506.04499v1
Date: Wed, 04 Jun 2025 22:46:28 GMT
Title: FALO: Fast and Accurate LiDAR 3D Object Detection on Resource-Constrained Devices
Authors: Shizhong Han, Hsin-Pai Cheng, Hong Cai, Jihad Masri, Soyeb Nagori, Fatih Porikli,
Abstract summary: Existing LiDAR 3D object detection methods rely on sparse convolutions and/or transformers, which can be challenging to run on resource-constrained edge devices.<n>We propose FALO, a hardware-friendly approach to LiDAR 3D detection, which offers both state-of-the-art (SOTA) detection accuracy and fast inference speed.
Score: 38.61635285386612
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Existing LiDAR 3D object detection methods predominantely rely on sparse convolutions and/or transformers, which can be challenging to run on resource-constrained edge devices, due to irregular memory access patterns and high computational costs. In this paper, we propose FALO, a hardware-friendly approach to LiDAR 3D detection, which offers both state-of-the-art (SOTA) detection accuracy and fast inference speed. More specifically, given the 3D point cloud and after voxelization, FALO first arranges sparse 3D voxels into a 1D sequence based on their coordinates and proximity. The sequence is then processed by our proposed ConvDotMix blocks, consisting of large-kernel convolutions, Hadamard products, and linear layers. ConvDotMix provides sufficient mixing capability in both spatial and embedding dimensions, and introduces higher-order nonlinear interaction among spatial features. Furthermore, when going through the ConvDotMix layers, we introduce implicit grouping, which balances the tensor dimensions for more efficient inference and takes into account the growing receptive field. All these operations are friendly to run on resource-constrained platforms and proposed FALO can readily deploy on compact, embedded devices. Our extensive evaluation on LiDAR 3D detection benchmarks such as nuScenes and Waymo shows that FALO achieves competitive performance. Meanwhile, FALO is 1.6~9.8x faster than the latest SOTA on mobile Graphics Processing Unit (GPU) and mobile Neural Processing Unit (NPU).

Related papers

CMF-IoU: Multi-Stage Cross-Modal Fusion 3D Object Detection with IoU Joint Prediction [29.7092783661859]
Multi-modal methods based on camera and LiDAR sensors have garnered significant attention in the field of 3D detection.<n>We introduce a multi-stage cross-modal fusion 3D detection framework, termed CMF-IOU, to address the challenge of aligning 3D spatial and 2D semantic information.
arXiv Detail & Related papers (2025-08-18T13:32:07Z)
S3MOT: Monocular 3D Object Tracking with Selective State Space Model [3.5047603107971397]
Multi-object tracking in 3D space is essential for advancing robotics and computer applications.<n>It remains a significant challenge in monocular setups due to the difficulty of mining 3D associations from 2D video streams.<n>We present three innovative techniques to enhance the fusion of heterogeneous cues for monocular 3D MOT.
arXiv Detail & Related papers (2025-04-25T04:45:35Z)
FusionRCNN: LiDAR-Camera Fusion for Two-stage 3D Object Detection [11.962073589763676]
Existing 3D detectors significantly improve the accuracy by adopting a two-stage paradigm. The sparsity of point clouds, especially for the points far away, makes it difficult for the LiDAR-only refinement module to accurately recognize and locate objects. We propose a novel multi-modality two-stage approach named FusionRCNN, which effectively and efficiently fuses point clouds and camera images in the Regions of Interest(RoI) FusionRCNN significantly improves the strong SECOND baseline by 6.14% mAP on baseline, and outperforms competing two-stage approaches.
arXiv Detail & Related papers (2022-09-22T02:07:25Z)
Ret3D: Rethinking Object Relations for Efficient 3D Object Detection in Driving Scenes [82.4186966781934]
We introduce a simple, efficient, and effective two-stage detector, termed as Ret3D. At the core of Ret3D is the utilization of novel intra-frame and inter-frame relation modules. With negligible extra overhead, Ret3D achieves the state-of-the-art performance.
arXiv Detail & Related papers (2022-08-18T03:48:58Z)
PillarGrid: Deep Learning-based Cooperative Perception for 3D Object Detection from Onboard-Roadside LiDAR [15.195933965761645]
We propose textitPillarGrid, a novel cooperative perception method fusing information from multiple 3D LiDARs. PillarGrid consists of four main phases: 1) cooperative preprocessing of point clouds, 2) pillar-wise voxelization and feature extraction, 3) grid-wise deep fusion of features from multiple sensors, and 4) convolutional neural network (CNN)-based augmented 3D object detection. Extensive experimentation shows that PillarGrid outperforms the SOTA single-LiDAR-based 3D object detection methods with respect to both accuracy and range by a large margin.
arXiv Detail & Related papers (2022-03-12T02:28:41Z)
Dense Voxel Fusion for 3D Object Detection [10.717415797194896]
Voxel Fusion (DVF) is a sequential fusion method that generates multi-scale dense voxel feature representations. We train directly with ground truth 2D bounding box labels, avoiding noisy, detector-specific, 2D predictions. We show that our proposed multi-modal training strategy results in better generalization compared to training using erroneous 2D predictions.
arXiv Detail & Related papers (2022-03-02T04:51:31Z)
Embracing Single Stride 3D Object Detector with Sparse Transformer [63.179720817019096]
In LiDAR-based 3D object detection for autonomous driving, the ratio of the object size to input scene size is significantly smaller compared to 2D detection cases. Many 3D detectors directly follow the common practice of 2D detectors, which downsample the feature maps even after quantizing the point clouds. We propose Single-stride Sparse Transformer (SST) to maintain the original resolution from the beginning to the end of the network.
arXiv Detail & Related papers (2021-12-13T02:12:02Z)
Progressive Coordinate Transforms for Monocular 3D Object Detection [52.00071336733109]
We propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations. In this paper, we propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
arXiv Detail & Related papers (2021-08-12T15:22:33Z)
Anchor-free 3D Single Stage Detector with Mask-Guided Attention for Point Cloud [79.39041453836793]
We develop a novel single-stage 3D detector for point clouds in an anchor-free manner. We overcome this by converting the voxel-based sparse 3D feature volumes into the sparse 2D feature maps. We propose an IoU-based detection confidence re-calibration scheme to improve the correlation between the detection confidence score and the accuracy of the bounding box regression.
arXiv Detail & Related papers (2021-08-08T13:42:13Z)
Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation [81.02742110604161]
State-of-the-art methods for large-scale driving-scene LiDAR segmentation often project the point clouds to 2D space and then process them via 2D convolution. We propose a new framework for the outdoor LiDAR segmentation, where cylindrical partition and asymmetrical 3D convolution networks are designed to explore the 3D geometric pat-tern. Our method achieves the 1st place in the leaderboard of Semantic KITTI and outperforms existing methods on nuScenes with a noticeable margin, about 4%.
arXiv Detail & Related papers (2020-11-19T18:53:11Z)
Generative Sparse Detection Networks for 3D Single-shot Object Detection [43.91336826079574]
3D object detection has been widely studied due to its potential applicability to many promising areas such as robotics and augmented reality. Yet, the sparse nature of the 3D data poses unique challenges to this task. We propose Generative Sparse Detection Network (GSDN), a fully-convolutional single-shot sparse detection network.
arXiv Detail & Related papers (2020-06-22T15:54:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.