STCrowd: A Multimodal Dataset for Pedestrian Perception in Crowded
Scenes
- URL: http://arxiv.org/abs/2204.01026v1
- Date: Sun, 3 Apr 2022 08:26:07 GMT
- Title: STCrowd: A Multimodal Dataset for Pedestrian Perception in Crowded
Scenes
- Authors: Peishan Cong and Xinge Zhu and Feng Qiao and Yiming Ren and Xidong
Peng and Yuenan Hou and Lan Xu and Ruigang Yang and Dinesh Manocha and Yuexin
Ma
- Abstract summary: Accurately detecting and tracking pedestrians in 3D space is challenging due to large variations in rotations, poses and scales.
Existing benchmarks either only provide 2D annotations, or have limited 3D annotations with low-density pedestrian distribution.
We introduce a large-scale multimodal dataset, STCrowd, to better evaluate pedestrian perception algorithms in crowded scenarios.
- Score: 78.95447086305381
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Accurately detecting and tracking pedestrians in 3D space is challenging due
to large variations in rotations, poses and scales. The situation becomes even
worse for dense crowds with severe occlusions. However, existing benchmarks
either only provide 2D annotations, or have limited 3D annotations with
low-density pedestrian distribution, making it difficult to build a reliable
pedestrian perception system especially in crowded scenes. To better evaluate
pedestrian perception algorithms in crowded scenarios, we introduce a
large-scale multimodal dataset,STCrowd. Specifically, in STCrowd, there are a
total of 219 K pedestrian instances and 20 persons per frame on average, with
various levels of occlusion. We provide synchronized LiDAR point clouds and
camera images as well as their corresponding 3D labels and joint IDs. STCrowd
can be used for various tasks, including LiDAR-only, image-only, and
sensor-fusion based pedestrian detection and tracking. We provide baselines for
most of the tasks. In addition, considering the property of sparse global
distribution and density-varying local distribution of pedestrians, we further
propose a novel method, Density-aware Hierarchical heatmap Aggregation (DHA),
to enhance pedestrian perception in crowded scenes. Extensive experiments show
that our new method achieves state-of-the-art performance for pedestrian
detection on various datasets.
Related papers
- RoScenes: A Large-scale Multi-view 3D Dataset for Roadside Perception [98.76525636842177]
RoScenes is the largest multi-view roadside perception dataset.
Our dataset achieves surprising 21.13M 3D annotations within 64,000 $m2$.
arXiv Detail & Related papers (2024-05-16T08:06:52Z) - HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting [53.6394928681237]
holistic understanding of urban scenes based on RGB images is a challenging yet important problem.
Our main idea involves the joint optimization of geometry, appearance, semantics, and motion using a combination of static and dynamic 3D Gaussians.
Our approach offers the ability to render new viewpoints in real-time, yielding 2D and 3D semantic information with high accuracy.
arXiv Detail & Related papers (2024-03-19T13:39:05Z) - Regulating Intermediate 3D Features for Vision-Centric Autonomous
Driving [26.03800936700545]
We propose to regulate intermediate dense 3D features with the help of volume rendering.
Experimental results on the Occ3D and nuScenes datasets demonstrate that Vampire facilitates fine-grained and appropriate extraction of dense 3D features.
arXiv Detail & Related papers (2023-12-19T04:09:05Z) - Unsupervised Multi-view Pedestrian Detection [12.882317991955228]
We propose an Unsupervised Multi-view Pedestrian Detection approach (UMPD) to eliminate the need of annotations to learn a multi-view pedestrian detector via 2D-3D mapping.
SIS is proposed to extract unsupervised representations of multi-view images, which are converted into 2D pedestrian masks as pseudo labels.
GVD encodes multi-view 2D images into a 3D volume to predict voxel-wise density and color via 2D-to-3D geometric projection, trained by 3D-to-2D mapping.
arXiv Detail & Related papers (2023-05-21T13:27:02Z) - SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving [98.74706005223685]
3D scene understanding plays a vital role in vision-based autonomous driving.
We propose a SurroundOcc method to predict the 3D occupancy with multi-camera images.
arXiv Detail & Related papers (2023-03-16T17:59:08Z) - 3D Random Occlusion and Multi-Layer Projection for Deep Multi-Camera
Pedestrian Localization [6.929027496437192]
The proposed 3DROM method has a greatly improved performance in comparison with the state-of-the-art deep-learning based methods for multi-view pedestrian detection.
arXiv Detail & Related papers (2022-07-22T06:15:20Z) - Progressive Coordinate Transforms for Monocular 3D Object Detection [52.00071336733109]
We propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
In this paper, we propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
arXiv Detail & Related papers (2021-08-12T15:22:33Z) - A Flow Base Bi-path Network for Cross-scene Video Crowd Understanding in
Aerial View [93.23947591795897]
In this paper, we strive to tackle the challenges and automatically understand the crowd from the visual data collected from drones.
To alleviate the background noise generated in cross-scene testing, a double-stream crowd counting model is proposed.
To tackle the crowd density estimation problem under extreme dark environments, we introduce synthetic data generated by game Grand Theft Auto V(GTAV)
arXiv Detail & Related papers (2020-09-29T01:48:24Z) - SemanticVoxels: Sequential Fusion for 3D Pedestrian Detection using
LiDAR Point Cloud and Semantic Segmentation [4.350338899049983]
We propose a generalization of PointPainting to be able to apply fusion at different levels.
We show that SemanticVoxels achieves state-of-the-art performance in both 3D and bird's eye view pedestrian detection benchmarks.
arXiv Detail & Related papers (2020-09-25T14:52:32Z) - Cityscapes 3D: Dataset and Benchmark for 9 DoF Vehicle Detection [7.531596091318718]
We propose Cityscapes 3D, extending the original Cityscapes dataset with 3D bounding box annotations for all types of vehicles.
In contrast to existing datasets, our 3D annotations were labeled using stereo RGB images only and capture all nine degrees of freedom.
In addition, we complement the Cityscapes benchmark suite with 3D vehicle detection based on the new annotations as well as metrics presented in this work.
arXiv Detail & Related papers (2020-06-14T10:56:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.