Unsupervised Multi-view Pedestrian Detection
- URL: http://arxiv.org/abs/2305.12457v2
- Date: Sun, 19 Nov 2023 13:05:09 GMT
- Title: Unsupervised Multi-view Pedestrian Detection
- Authors: Mengyin Liu, Chao Zhu, Shiqi Ren, Xu-Cheng Yin
- Abstract summary: We propose an Unsupervised Multi-view Pedestrian Detection approach (UMPD) to eliminate the need of annotations to learn a multi-view pedestrian detector via 2D-3D mapping.
SIS is proposed to extract unsupervised representations of multi-view images, which are converted into 2D pedestrian masks as pseudo labels.
GVD encodes multi-view 2D images into a 3D volume to predict voxel-wise density and color via 2D-to-3D geometric projection, trained by 3D-to-2D mapping.
- Score: 12.882317991955228
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the prosperity of the video surveillance, multiple cameras have been
applied to accurately locate pedestrians in a specific area. However, previous
methods rely on the human-labeled annotations in every video frame and camera
view, leading to heavier burden than necessary camera calibration and
synchronization. Therefore, we propose in this paper an Unsupervised Multi-view
Pedestrian Detection approach (UMPD) to eliminate the need of annotations to
learn a multi-view pedestrian detector via 2D-3D mapping. 1) Firstly,
Semantic-aware Iterative Segmentation (SIS) is proposed to extract unsupervised
representations of multi-view images, which are converted into 2D pedestrian
masks as pseudo labels, via our proposed iterative PCA and zero-shot semantic
classes from vision-language models. 2) Secondly, we propose Geometry-aware
Volume-based Detector (GVD) to end-to-end encode multi-view 2D images into a 3D
volume to predict voxel-wise density and color via 2D-to-3D geometric
projection, trained by 3D-to-2D rendering losses with SIS pseudo labels. 3)
Thirdly, for better detection results, i.e., the 3D density projected on
Birds-Eye-View from GVD, we propose Vertical-aware BEV Regularization (VBR) to
constraint them to be vertical like the natural pedestrian poses. Extensive
experiments on popular multi-view pedestrian detection benchmarks Wildtrack,
Terrace, and MultiviewX, show that our proposed UMPD approach, as the first
fully-unsupervised method to our best knowledge, performs competitively to the
previous state-of-the-art supervised techniques. Code will be available.
Related papers
- Geometry-Biased Transformer for Robust Multi-View 3D Human Pose
Reconstruction [3.069335774032178]
We propose a novel encoder-decoder Transformer architecture to estimate 3D poses from multi-view 2D pose sequences.
We conduct experiments on three benchmark public datasets, Human3.6M, CMU Panoptic and Occlusion-Persons.
arXiv Detail & Related papers (2023-12-28T16:30:05Z) - Towards Generalizable Multi-Camera 3D Object Detection via Perspective
Debiasing [28.874014617259935]
Multi-Camera 3D Object Detection (MC3D-Det) has gained prominence with the advent of bird's-eye view (BEV) approaches.
We propose a novel method that aligns 3D detection with 2D camera plane results, ensuring consistent and accurate detections.
arXiv Detail & Related papers (2023-10-17T15:31:28Z) - Unsupervised 3D Keypoint Discovery with Multi-View Geometry [104.76006413355485]
We propose an algorithm that learns to discover 3D keypoints on human bodies from multiple-view images without supervision or labels.
Our approach discovers more interpretable and accurate 3D keypoints compared to other state-of-the-art unsupervised approaches.
arXiv Detail & Related papers (2022-11-23T10:25:12Z) - A Simple Baseline for Multi-Camera 3D Object Detection [94.63944826540491]
3D object detection with surrounding cameras has been a promising direction for autonomous driving.
We present SimMOD, a Simple baseline for Multi-camera Object Detection.
We conduct extensive experiments on the 3D object detection benchmark of nuScenes to demonstrate the effectiveness of SimMOD.
arXiv Detail & Related papers (2022-08-22T03:38:01Z) - Monocular 3D Object Detection with Depth from Motion [74.29588921594853]
We take advantage of camera ego-motion for accurate object depth estimation and detection.
Our framework, named Depth from Motion (DfM), then uses the established geometry to lift 2D image features to the 3D space and detects 3D objects thereon.
Our framework outperforms state-of-the-art methods by a large margin on the KITTI benchmark.
arXiv Detail & Related papers (2022-07-26T15:48:46Z) - DSGN++: Exploiting Visual-Spatial Relation forStereo-based 3D Detectors [60.88824519770208]
Camera-based 3D object detectors are welcome due to their wider deployment and lower price than LiDAR sensors.
We revisit the prior stereo modeling DSGN about the stereo volume constructions for representing both 3D geometry and semantics.
We propose our approach, DSGN++, aiming for improving information flow throughout the 2D-to-3D pipeline.
arXiv Detail & Related papers (2022-04-06T18:43:54Z) - Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo [71.59494156155309]
Existing approaches for multi-view 3D pose estimation explicitly establish cross-view correspondences to group 2D pose detections from multiple camera views.
We present our multi-view 3D pose estimation approach based on plane sweep stereo to jointly address the cross-view fusion and 3D pose reconstruction in a single shot.
arXiv Detail & Related papers (2021-04-06T03:49:35Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.