Related papers: Lightweight LiDAR-Camera 3D Dynamic Object Detection and Multi-Class Trajectory Prediction

Lightweight LiDAR-Camera 3D Dynamic Object Detection and Multi-Class Trajectory Prediction

URL: http://arxiv.org/abs/2504.13647v1
Date: Fri, 18 Apr 2025 11:59:34 GMT
Title: Lightweight LiDAR-Camera 3D Dynamic Object Detection and Multi-Class Trajectory Prediction
Authors: Yushen He, Lei Zhao, Tianchen Deng, Zipeng Fang, Weidong Chen,
Abstract summary: Service mobile robots are often required to avoid dynamic objects while performing their tasks.<n>We present a lightweight multi-modal framework for 3D object detection and trajectory prediction.<n>Our system integrates LiDAR and camera inputs to achieve real-time perception of pedestrians, vehicles, and riders in 3D space.
Score: 7.415417400188903
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Service mobile robots are often required to avoid dynamic objects while performing their tasks, but they usually have only limited computational resources. So we present a lightweight multi-modal framework for 3D object detection and trajectory prediction. Our system synergistically integrates LiDAR and camera inputs to achieve real-time perception of pedestrians, vehicles, and riders in 3D space. The framework proposes two novel modules: 1) a Cross-Modal Deformable Transformer (CMDT) for object detection with high accuracy and acceptable amount of computation, and 2) a Reference Trajectory-based Multi-Class Transformer (RTMCT) for efficient and diverse trajectory prediction of mult-class objects with flexible trajectory lengths. Evaluations on the CODa benchmark demonstrate superior performance over existing methods across detection (+2.03% in mAP) and trajectory prediction (-0.408m in minADE5 of pedestrians) metrics. Remarkably, the system exhibits exceptional deployability - when implemented on a wheelchair robot with an entry-level NVIDIA 3060 GPU, it achieves real-time inference at 13.2 fps. To facilitate reproducibility and practical deployment, we release the related code of the method at https://github.com/TossherO/3D_Perception and its ROS inference version at https://github.com/TossherO/ros_packages.

Related papers

Street Gaussians without 3D Object Tracker [86.62329193275916]
Existing methods rely on labor-intensive manual labeling of object poses to reconstruct dynamic objects in canonical space.<n>We propose a stable object tracking module by leveraging associations from 2D deep trackers within a 3D object fusion strategy.<n>We address inevitable tracking errors by further introducing a motion learning strategy in an implicit feature space that autonomously corrects trajectory errors and recovers missed detections.
arXiv Detail & Related papers (2024-12-07T05:49:42Z)
3DMOTFormer: Graph Transformer for Online 3D Multi-Object Tracking [15.330384668966806]
State-of-the-art 3D multi-object tracking (MOT) approaches typically rely on non-learned model-based algorithms such as Kalman Filter. We propose 3DMOTFormer, a learned geometry-based 3D MOT framework building upon the transformer architecture. Our approach achieves 71.2% and 68.2% AMOTA on the nuScenes validation and test split, respectively.
arXiv Detail & Related papers (2023-08-12T19:19:58Z)
FocalFormer3D : Focusing on Hard Instance for 3D Object Detection [97.56185033488168]
False negatives (FN) in 3D object detection can lead to potentially dangerous situations in autonomous driving. In this work, we propose Hard Instance Probing (HIP), a general pipeline that identifies textitFN in a multi-stage manner. We instantiate this method as FocalFormer3D, a simple yet effective detector that excels at excavating difficult objects.
arXiv Detail & Related papers (2023-08-08T20:06:12Z)
TrajectoryFormer: 3D Object Tracking Transformer with Predictive Trajectory Hypotheses [51.60422927416087]
3D multi-object tracking (MOT) is vital for many applications including autonomous driving vehicles and service robots. We present TrajectoryFormer, a novel point-cloud-based 3D MOT framework.
arXiv Detail & Related papers (2023-06-09T13:31:50Z)
ByteTrackV2: 2D and 3D Multi-Object Tracking by Associating Every Detection Box [81.45219802386444]
Multi-object tracking (MOT) aims at estimating bounding boxes and identities of objects across video frames. We propose a hierarchical data association strategy to mine the true objects in low-score detection boxes. In 3D scenarios, it is much easier for the tracker to predict object velocities in the world coordinate.
arXiv Detail & Related papers (2023-03-27T15:35:21Z)
Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection [20.161887223481994]
We propose a long-sequence modeling framework, named StreamPETR, for multi-view 3D object detection. StreamPETR achieves significant performance improvements only with negligible cost, compared to the single-frame baseline. The lightweight version realizes 45.0% mAP and 31.7 FPS, outperforming the state-of-the-art method (SOLOFusion) by 2.3% mAP and 1.8x faster FPS.
arXiv Detail & Related papers (2023-03-21T15:19:20Z)
CAMO-MOT: Combined Appearance-Motion Optimization for 3D Multi-Object Tracking with Camera-LiDAR Fusion [34.42289908350286]
3D Multi-object tracking (MOT) ensures consistency during continuous dynamic detection. It can be challenging to accurately track the irregular motion of objects for LiDAR-based methods. We propose a novel camera-LiDAR fusion 3D MOT framework based on the Combined Appearance-Motion Optimization (CAMO-MOT)
arXiv Detail & Related papers (2022-09-06T14:41:38Z)
AutoAlignV2: Deformable Feature Aggregation for Dynamic Multi-Modal 3D Object Detection [17.526914782562528]
We propose AutoAlignV2, a faster and stronger multi-modal 3D detection framework, built on top of AutoAlign. Our best model reaches 72.4 NDS on nuScenes test leaderboard, achieving new state-of-the-art results.
arXiv Detail & Related papers (2022-07-21T06:17:23Z)
2nd Place Solution for Waymo Open Dataset Challenge - Real-time 2D Object Detection [26.086623067939605]
In this report, we introduce a real-time method to detect the 2D objects from images. We leverage accelerationRT to optimize the inference time of our detection pipeline. Our framework achieves the latency of 45.8ms/frame on an Nvidia Tesla V100 GPU.
arXiv Detail & Related papers (2021-06-16T11:32:03Z)
Learnable Online Graph Representations for 3D Multi-Object Tracking [156.58876381318402]
We propose a unified and learning based approach to the 3D MOT problem. We employ a Neural Message Passing network for data association that is fully trainable. We show the merit of the proposed approach on the publicly available nuScenes dataset by achieving state-of-the-art performance of 65.6% AMOTA and 58% fewer ID-switches.
arXiv Detail & Related papers (2021-04-23T17:59:28Z)
Monocular Quasi-Dense 3D Object Tracking [99.51683944057191]
A reliable and accurate 3D tracking framework is essential for predicting future locations of surrounding objects and planning the observer's actions in numerous applications such as autonomous driving. We propose a framework that can effectively associate moving objects over time and estimate their full 3D bounding box information from a sequence of 2D images captured on a moving platform.
arXiv Detail & Related papers (2021-03-12T15:30:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.