Related papers: Learning better representations for crowded pedestrians in offboard LiDAR-camera 3D tracking-by-detection

Learning better representations for crowded pedestrians in offboard LiDAR-camera 3D tracking-by-detection

URL: http://arxiv.org/abs/2505.16029v1
Date: Wed, 21 May 2025 21:18:26 GMT
Title: Learning better representations for crowded pedestrians in offboard LiDAR-camera 3D tracking-by-detection
Authors: Shichao Li, Peiliang Li, Qing Lian, Peng Yun, Xiaozhi Chen,
Abstract summary: We build an offboard auto-labeling system that reconstructs pedestrian trajectories from LiDAR point cloud and multi-view images.<n>Our approach significantly improves the 3D pedestrian tracking performance towards higher auto-labeling efficiency.
Score: 14.56852056332248
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Perceiving pedestrians in highly crowded urban environments is a difficult long-tail problem for learning-based autonomous perception. Speeding up 3D ground truth generation for such challenging scenes is performance-critical yet very challenging. The difficulties include the sparsity of the captured pedestrian point cloud and a lack of suitable benchmarks for a specific system design study. To tackle the challenges, we first collect a new multi-view LiDAR-camera 3D multiple-object-tracking benchmark of highly crowded pedestrians for in-depth analysis. We then build an offboard auto-labeling system that reconstructs pedestrian trajectories from LiDAR point cloud and multi-view images. To improve the generalization power for crowded scenes and the performance for small objects, we propose to learn high-resolution representations that are density-aware and relationship-aware. Extensive experiments validate that our approach significantly improves the 3D pedestrian tracking performance towards higher auto-labeling efficiency. The code will be publicly available at this HTTP URL.

Related papers

Street Gaussians without 3D Object Tracker [86.62329193275916]
Existing methods rely on labor-intensive manual labeling of object poses to reconstruct dynamic objects in canonical space.<n>We propose a stable object tracking module by leveraging associations from 2D deep trackers within a 3D object fusion strategy.<n>We address inevitable tracking errors by further introducing a motion learning strategy in an implicit feature space that autonomously corrects trajectory errors and recovers missed detections.
arXiv Detail & Related papers (2024-12-07T05:49:42Z)
On Deep Learning for Geometric and Semantic Scene Understanding Using On-Vehicle 3D LiDAR [4.606106768645647]
3D LiDAR point cloud data is crucial for scene perception in computer vision, robotics, and autonomous driving. We present DurLAR, the first high-fidelity 128-channel 3D LiDAR dataset featuring panoramic ambient (near infrared) and reflectivity imagery. To improve the segmentation accuracy, we introduce Range-Aware Pointwise Distance Distribution (RAPiD) features and the associated RAPiD-Seg architecture.
arXiv Detail & Related papers (2024-11-01T14:01:54Z)
LidaRF: Delving into Lidar for Neural Radiance Field on Street Scenes [73.65115834242866]
Photorealistic simulation plays a crucial role in applications such as autonomous driving. However, reconstruction quality suffers on street scenes due to collinear camera motions and sparser samplings at higher speeds. We propose several insights that allow a better utilization of Lidar data to improve NeRF quality on street scenes.
arXiv Detail & Related papers (2024-05-01T23:07:12Z)
Neural Rendering based Urban Scene Reconstruction for Autonomous Driving [8.007494499012624]
We propose a multimodal 3D scene reconstruction using a framework combining neural implicit surfaces and radiance fields. Dense 3D reconstruction has many applications in automated driving including automated annotation validation. We demonstrate qualitative and quantitative results on challenging automotive scenes.
arXiv Detail & Related papers (2024-02-09T23:20:23Z)
Semantic and Articulated Pedestrian Sensing Onboard a Moving Vehicle [0.0]
It is difficult to perform 3D reconstruction from on-vehicle gathered video due to the large forward motion of the vehicle. Recently Light Detection And Ranging (LiDAR) sensors have become popular to directly estimate depths without the need to perform 3D reconstructions. We hypothesize that benchmarks targeted at articulated human sensing from LiDAR data could bring about increased research in human sensing and prediction in traffic.
arXiv Detail & Related papers (2023-09-12T15:24:26Z)
Unsupervised Multi-view Pedestrian Detection [12.882317991955228]
We propose an Unsupervised Multi-view Pedestrian Detection approach (UMPD) to eliminate the need of annotations to learn a multi-view pedestrian detector via 2D-3D mapping. SIS is proposed to extract unsupervised representations of multi-view images, which are converted into 2D pedestrian masks as pseudo labels. GVD encodes multi-view 2D images into a 3D volume to predict voxel-wise density and color via 2D-to-3D geometric projection, trained by 3D-to-2D mapping.
arXiv Detail & Related papers (2023-05-21T13:27:02Z)
HUM3DIL: Semi-supervised Multi-modal 3D Human Pose Estimation for Autonomous Driving [95.42203932627102]
3D human pose estimation is an emerging technology, which can enable the autonomous vehicle to perceive and understand the subtle and complex behaviors of pedestrians. Our method efficiently makes use of these complementary signals, in a semi-supervised fashion and outperforms existing methods with a large margin. Specifically, we embed LiDAR points into pixel-aligned multi-modal features, which we pass through a sequence of Transformer refinement stages.
arXiv Detail & Related papers (2022-12-15T11:15:14Z)
Scalable and Real-time Multi-Camera Vehicle Detection, Re-Identification, and Tracking [58.95210121654722]
We propose a real-time city-scale multi-camera vehicle tracking system that handles real-world, low-resolution CCTV instead of idealized and curated video streams. Our method is ranked among the top five performers on the public leaderboard.
arXiv Detail & Related papers (2022-04-15T12:47:01Z)
STCrowd: A Multimodal Dataset for Pedestrian Perception in Crowded Scenes [78.95447086305381]
Accurately detecting and tracking pedestrians in 3D space is challenging due to large variations in rotations, poses and scales. Existing benchmarks either only provide 2D annotations, or have limited 3D annotations with low-density pedestrian distribution. We introduce a large-scale multimodal dataset, STCrowd, to better evaluate pedestrian perception algorithms in crowded scenarios.
arXiv Detail & Related papers (2022-04-03T08:26:07Z)
Disentangling and Vectorization: A 3D Visual Perception Approach for Autonomous Driving Based on Surround-View Fisheye Cameras [3.485767750936058]
Multidimensional Vector is proposed to include the utilizable information generated in different dimensions and stages. The experiments of real fisheye images demonstrate that our solution achieves state-of-the-art accuracy while being real-time in practice.
arXiv Detail & Related papers (2021-07-19T13:24:21Z)
Learnable Online Graph Representations for 3D Multi-Object Tracking [156.58876381318402]
We propose a unified and learning based approach to the 3D MOT problem. We employ a Neural Message Passing network for data association that is fully trainable. We show the merit of the proposed approach on the publicly available nuScenes dataset by achieving state-of-the-art performance of 65.6% AMOTA and 58% fewer ID-switches.
arXiv Detail & Related papers (2021-04-23T17:59:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.