Related papers: A Comparative Study of 3D Person Detection: Sensor Modalities and Robustness in Diverse Indoor and Outdoor Environments

A Comparative Study of 3D Person Detection: Sensor Modalities and Robustness in Diverse Indoor and Outdoor Environments

URL: http://arxiv.org/abs/2602.05538v2
Date: Fri, 06 Feb 2026 06:20:25 GMT
Title: A Comparative Study of 3D Person Detection: Sensor Modalities and Robustness in Diverse Indoor and Outdoor Environments
Authors: Malaz Tamim, Andrea Matic-Flierl, Karsten Roscher,
Abstract summary: This work presents a systematic evaluation of 3D person detection using camera-only, LiDAR-only, and camera-LiDAR fusion.<n>We compare three representative models - BEVDepth (camera), PointPillars (LiDAR), and DAL (camera-LiDAR fusion)<n>Our results show that the fusion-based approach consistently outperforms single-modality models, particularly in challenging scenarios.
Score: 5.89179309980335
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Accurate 3D person detection is critical for safety in applications such as robotics, industrial monitoring, and surveillance. This work presents a systematic evaluation of 3D person detection using camera-only, LiDAR-only, and camera-LiDAR fusion. While most existing research focuses on autonomous driving, we explore detection performance and robustness in diverse indoor and outdoor scenes using the JRDB dataset. We compare three representative models - BEVDepth (camera), PointPillars (LiDAR), and DAL (camera-LiDAR fusion) - and analyze their behavior under varying occlusion and distance levels. Our results show that the fusion-based approach consistently outperforms single-modality models, particularly in challenging scenarios. We further investigate robustness against sensor corruptions and misalignments, revealing that while DAL offers improved resilience, it remains sensitive to sensor misalignment and certain LiDAR-based corruptions. In contrast, the camera-based BEVDepth model showed the lowest performance and was most affected by occlusion, distance, and noise. Our findings highlight the importance of utilizing sensor fusion for enhanced 3D person detection, while also underscoring the need for ongoing research to address the vulnerabilities inherent in these systems.

Related papers

RobuRCDet: Enhancing Robustness of Radar-Camera Fusion in Bird's Eye View for 3D Object Detection [68.99784784185019]
Poor lighting or adverse weather conditions degrade camera performance.<n>Radar suffers from noise and positional ambiguity.<n>We propose RobuRCDet, a robust object detection model in BEV.
arXiv Detail & Related papers (2025-02-18T17:17:38Z)
MSC-Bench: Benchmarking and Analyzing Multi-Sensor Corruption for Driving Perception [9.575044300747061]
Multi-sensor fusion models play a crucial role in autonomous driving perception, particularly in tasks like 3D object detection and HD map construction.<n>These models provide essential and comprehensive static environmental information for autonomous driving systems.<n>While camera-LiDAR fusion methods have shown promising results, they often depend on complete sensor inputs.<n>This reliance can lead to low robustness and potential failures when sensors are corrupted or missing, raising significant safety concerns.<n>To tackle this challenge, we introduce the Multi-Sensor Corruption Benchmark (MSC-Bench), the first comprehensive benchmark aimed at evaluating the robustness of multi-sensor autonomous driving perception models against various sensor corruption
arXiv Detail & Related papers (2025-01-02T03:38:46Z)
Better Monocular 3D Detectors with LiDAR from the Past [64.6759926054061]
Camera-based 3D detectors often suffer inferior performance compared to LiDAR-based counterparts due to inherent depth ambiguities in images. In this work, we seek to improve monocular 3D detectors by leveraging unlabeled historical LiDAR data. We show consistent and significant performance gain across multiple state-of-the-art models and datasets with a negligible additional latency of 9.66 ms and a small storage cost.
arXiv Detail & Related papers (2024-04-08T01:38:43Z)
CR3DT: Camera-RADAR Fusion for 3D Detection and Tracking [40.630532348405595]
Camera-RADAR 3D Detection and Tracking (CR3DT) is a camera-RADAR fusion model for 3D object detection, and Multi-Object Tracking (MOT) Building upon the foundations of the State-of-the-Art (SotA) camera-only BEVDet architecture, CR3DT demonstrates substantial improvements in both detection and tracking capabilities.
arXiv Detail & Related papers (2024-03-22T16:06:05Z)
Robustness-Aware 3D Object Detection in Autonomous Driving: A Review and Outlook [19.539295469044813]
This study emphasizes the importance of robustness, alongside accuracy and latency, in evaluating perception systems under practical scenarios. Our work presents an extensive survey of camera-only, LiDAR-only, and multi-modal 3D object detection algorithms, thoroughly evaluating their trade-off between accuracy, latency, and robustness. Among these, multi-modal 3D detection approaches exhibit superior robustness, and a novel taxonomy is introduced to reorganize the literature for enhanced clarity.
arXiv Detail & Related papers (2024-01-12T12:35:45Z)
Multi-Modal 3D Object Detection by Box Matching [109.43430123791684]
We propose a novel Fusion network by Box Matching (FBMNet) for multi-modal 3D detection. With the learned assignments between 3D and 2D object proposals, the fusion for detection can be effectively performed by combing their ROI features.
arXiv Detail & Related papers (2023-05-12T18:08:51Z)
Robo3D: Towards Robust and Reliable 3D Perception against Corruptions [58.306694836881235]
We present Robo3D, the first comprehensive benchmark heading toward probing the robustness of 3D detectors and segmentors under out-of-distribution scenarios. We consider eight corruption types stemming from severe weather conditions, external disturbances, and internal sensor failure. We propose a density-insensitive training framework along with a simple flexible voxelization strategy to enhance the model resiliency.
arXiv Detail & Related papers (2023-03-30T17:59:17Z)
Benchmarking Robustness of 3D Object Detection to Common Corruptions in Autonomous Driving [44.753797839280516]
Existing 3D detectors lack robustness to real-world corruptions caused by adverse weathers, sensor noises, etc. We benchmark 27 types of common corruptions for both LiDAR and camera inputs considering real-world driving scenarios. We conduct large-scale experiments on 24 diverse 3D object detection models to evaluate their robustness.
arXiv Detail & Related papers (2023-03-20T11:45:54Z)
Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object Detection [58.81316192862618]
Two critical sensors for 3D perception in autonomous driving are the camera and the LiDAR. fusing these two modalities can significantly boost the performance of 3D perception models. We benchmark the state-of-the-art fusion methods for the first time.
arXiv Detail & Related papers (2022-05-30T09:35:37Z)
Sensor Adversarial Traits: Analyzing Robustness of 3D Object Detection Sensor Fusion Models [16.823829387723524]
We analyze the robustness of a high-performance, open source sensor fusion model architecture towards adversarial attacks. We find that despite the use of a LIDAR sensor, the model is vulnerable to our purposefully crafted image-based adversarial attacks.
arXiv Detail & Related papers (2021-09-13T23:38:42Z)
Domain and Modality Gaps for LiDAR-based Person Detection on Mobile Robots [91.01747068273666]
This paper studies existing LiDAR-based person detectors with a particular focus on mobile robot scenarios. Experiments revolve around the domain gap between driving and mobile robot scenarios, as well as the modality gap between 3D and 2D LiDAR sensors. Results provide practical insights into LiDAR-based person detection and facilitate informed decisions for relevant mobile robot designs and applications.
arXiv Detail & Related papers (2021-06-21T16:35:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.