ShaSTA-Fuse: Camera-LiDAR Sensor Fusion to Model Shape and
Spatio-Temporal Affinities for 3D Multi-Object Tracking
- URL: http://arxiv.org/abs/2310.02532v1
- Date: Wed, 4 Oct 2023 02:17:59 GMT
- Title: ShaSTA-Fuse: Camera-LiDAR Sensor Fusion to Model Shape and
Spatio-Temporal Affinities for 3D Multi-Object Tracking
- Authors: Tara Sadjadpour, Rares Ambrus, Jeannette Bohg
- Abstract summary: 3D multi-object tracking (MOT) is essential for an autonomous mobile agent to safely navigate a scene.
We aim to develop a 3D MOT framework that fuses camera and LiDAR sensor information.
- Score: 26.976216624424385
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: 3D multi-object tracking (MOT) is essential for an autonomous mobile agent to
safely navigate a scene. In order to maximize the perception capabilities of
the autonomous agent, we aim to develop a 3D MOT framework that fuses camera
and LiDAR sensor information. Building on our prior LiDAR-only work, ShaSTA,
which models shape and spatio-temporal affinities for 3D MOT, we propose a
novel camera-LiDAR fusion approach for learning affinities. At its core, this
work proposes a fusion technique that generates a rich sensory signal
incorporating information about depth and distant objects to enhance affinity
estimation for improved data association, track lifecycle management,
false-positive elimination, false-negative propagation, and track confidence
score refinement. Our main contributions include a novel fusion approach for
combining camera and LiDAR sensory signals to learn affinities, and a
first-of-its-kind multimodal sequential track confidence refinement technique
that fuses 2D and 3D detections. Additionally, we perform an ablative analysis
on each fusion step to demonstrate the added benefits of incorporating the
camera sensor, particular for small, distant objects that tend to suffer from
the depth-sensing limits and sparsity of LiDAR sensors. In sum, our technique
achieves state-of-the-art performance on the nuScenes benchmark amongst
multimodal 3D MOT algorithms using CenterPoint detections.
Related papers
- Progressive Multi-Modal Fusion for Robust 3D Object Detection [12.048303829428452]
Existing methods perform sensor fusion in a single view by projecting features from both modalities either in Bird's Eye View (BEV) or Perspective View (PV)
We propose ProFusion3D, a progressive fusion framework that combines features in both BEV and PV at both intermediate and object query levels.
Our architecture hierarchically fuses local and global features, enhancing the robustness of 3D object detection.
arXiv Detail & Related papers (2024-10-09T22:57:47Z) - Explore the LiDAR-Camera Dynamic Adjustment Fusion for 3D Object Detection [38.809645060899065]
Camera and LiDAR serve as informative sensors for accurate and robust autonomous driving systems.
These sensors often exhibit heterogeneous natures, resulting in distributional modality gaps.
We introduce a dynamic adjustment technology aimed at aligning modal distributions and learning effective modality representations.
arXiv Detail & Related papers (2024-07-22T02:42:15Z) - Multi-Modal Dataset Acquisition for Photometrically Challenging Object [56.30027922063559]
This paper addresses the limitations of current datasets for 3D vision tasks in terms of accuracy, size, realism, and suitable imaging modalities for photometrically challenging objects.
We propose a novel annotation and acquisition pipeline that enhances existing 3D perception and 6D object pose datasets.
arXiv Detail & Related papers (2023-08-21T10:38:32Z) - HUM3DIL: Semi-supervised Multi-modal 3D Human Pose Estimation for
Autonomous Driving [95.42203932627102]
3D human pose estimation is an emerging technology, which can enable the autonomous vehicle to perceive and understand the subtle and complex behaviors of pedestrians.
Our method efficiently makes use of these complementary signals, in a semi-supervised fashion and outperforms existing methods with a large margin.
Specifically, we embed LiDAR points into pixel-aligned multi-modal features, which we pass through a sequence of Transformer refinement stages.
arXiv Detail & Related papers (2022-12-15T11:15:14Z) - DeepFusion: A Robust and Modular 3D Object Detector for Lidars, Cameras
and Radars [2.2166853714891057]
We propose a modular multi-modal architecture to fuse lidars, cameras and radars in different combinations for 3D object detection.
Specialized feature extractors take advantage of each modality and can be exchanged easily, making the approach simple and flexible.
Experimental results for lidar-camera, lidar-camera-radar and camera-radar fusion show the flexibility and effectiveness of our fusion approach.
arXiv Detail & Related papers (2022-09-26T14:33:30Z) - MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth
Seeds for 3D Object Detection [89.26380781863665]
Fusing LiDAR and camera information is essential for achieving accurate and reliable 3D object detection in autonomous driving systems.
Recent approaches aim at exploring the semantic densities of camera features through lifting points in 2D camera images into 3D space for fusion.
We propose a novel framework that focuses on the multi-scale progressive interaction of the multi-granularity LiDAR and camera features.
arXiv Detail & Related papers (2022-09-07T12:29:29Z) - Bridging the View Disparity of Radar and Camera Features for Multi-modal
Fusion 3D Object Detection [6.959556180268547]
This paper focuses on how to utilize millimeter-wave (MMW) radar and camera sensor fusion for 3D object detection.
A novel method which realizes the feature-level fusion under bird-eye view (BEV) for a better feature representation is proposed.
arXiv Detail & Related papers (2022-08-25T13:21:37Z) - Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object
Detection [58.81316192862618]
Two critical sensors for 3D perception in autonomous driving are the camera and the LiDAR.
fusing these two modalities can significantly boost the performance of 3D perception models.
We benchmark the state-of-the-art fusion methods for the first time.
arXiv Detail & Related papers (2022-05-30T09:35:37Z) - DeepFusionMOT: A 3D Multi-Object Tracking Framework Based on
Camera-LiDAR Fusion with Deep Association [8.34219107351442]
This paper proposes a robust camera-LiDAR fusion-based MOT method that achieves a good trade-off between accuracy and speed.
Our proposed method presents obvious advantages over the state-of-the-art MOT methods in terms of both tracking accuracy and processing speed.
arXiv Detail & Related papers (2022-02-24T13:36:29Z) - EPNet++: Cascade Bi-directional Fusion for Multi-Modal 3D Object
Detection [56.03081616213012]
We propose EPNet++ for multi-modal 3D object detection by introducing a novel Cascade Bi-directional Fusion(CB-Fusion) module.
The proposed CB-Fusion module boosts the plentiful semantic information of point features with the image features in a cascade bi-directional interaction fusion manner.
The experiment results on the KITTI, JRDB and SUN-RGBD datasets demonstrate the superiority of EPNet++ over the state-of-the-art methods.
arXiv Detail & Related papers (2021-12-21T10:48:34Z) - Deep Continuous Fusion for Multi-Sensor 3D Object Detection [103.5060007382646]
We propose a novel 3D object detector that can exploit both LIDAR as well as cameras to perform very accurate localization.
We design an end-to-end learnable architecture that exploits continuous convolutions to fuse image and LIDAR feature maps at different levels of resolution.
arXiv Detail & Related papers (2020-12-20T18:43:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.