Enhanced Multi-View Pedestrian Detection Using Probabilistic Occupancy Volume
- URL: http://arxiv.org/abs/2503.10982v1
- Date: Fri, 14 Mar 2025 01:05:44 GMT
- Title: Enhanced Multi-View Pedestrian Detection Using Probabilistic Occupancy Volume
- Authors: Reef Alturki, Adrian Hilton, Jean-Yves Guillemaut,
- Abstract summary: Occlusion poses a significant challenge in pedestrian detection from a single view.<n>Recent advances in multi-view detection utilized an early-fusion strategy that strategically projects the features onto the ground plane.<n>We introduce a novel model that efficiently leverages traditional 3D reconstruction techniques to enhance deep multi-view pedestrian detection.
- Score: 21.393389135740712
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Occlusion poses a significant challenge in pedestrian detection from a single view. To address this, multi-view detection systems have been utilized to aggregate information from multiple perspectives. Recent advances in multi-view detection utilized an early-fusion strategy that strategically projects the features onto the ground plane, where detection analysis is performed. A promising approach in this context is the use of 3D feature-pulling technique, which constructs a 3D feature volume of the scene by sampling the corresponding 2D features for each voxel. However, it creates a 3D feature volume of the whole scene without considering the potential locations of pedestrians. In this paper, we introduce a novel model that efficiently leverages traditional 3D reconstruction techniques to enhance deep multi-view pedestrian detection. This is accomplished by complementing the 3D feature volume with probabilistic occupancy volume, which is constructed using the visual hull technique. The probabilistic occupancy volume focuses the model's attention on regions occupied by pedestrians and improves detection accuracy. Our model outperforms state-of-the-art models on the MultiviewX dataset, with an MODA of 97.3%, while achieving competitive performance on the Wildtrack dataset.
Related papers
- Attention-Aware Multi-View Pedestrian Tracking [21.393389135740712]
Recent multi-view pedestrian detection models have highlighted the potential of an early-fusion strategy.
This strategy has been shown to improve both detection and tracking performance.
We propose a novel model that incorporates attention mechanisms in a multi-view pedestrian tracking scenario.
arXiv Detail & Related papers (2025-04-03T21:53:08Z) - MonoDINO-DETR: Depth-Enhanced Monocular 3D Object Detection Using a Vision Foundation Model [2.0624236247076397]
This study employs a Vision Transformer (ViT)-based foundation model as the backbone, which excels at capturing global features for depth estimation.<n>It integrates a detection transformer (DETR) architecture to improve both depth estimation and object detection performance in a one-stage manner.<n>The proposed model outperforms recent state-of-the-art methods, as demonstrated through evaluations on the KITTI 3D benchmark and a custom dataset collected from high-elevation racing environments.
arXiv Detail & Related papers (2025-02-01T04:37:13Z) - ViewFormer: Exploring Spatiotemporal Modeling for Multi-View 3D Occupancy Perception via View-Guided Transformers [9.271932084757646]
3D occupancy represents the entire scene without distinguishing between foreground and background by the physical space into a grid map.
We propose our learning-first view attention mechanism for effective multi-view feature aggregation.
We present FlowOcc3D, a benchmark built on top existing high-quality datasets.
arXiv Detail & Related papers (2024-05-07T13:15:07Z) - Towards Unified 3D Object Detection via Algorithm and Data Unification [70.27631528933482]
We build the first unified multi-modal 3D object detection benchmark MM- Omni3D and extend the aforementioned monocular detector to its multi-modal version.
We name the designed monocular and multi-modal detectors as UniMODE and MM-UniMODE, respectively.
arXiv Detail & Related papers (2024-02-28T18:59:31Z) - 3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features [70.50665869806188]
3DiffTection is a state-of-the-art method for 3D object detection from single images.
We fine-tune a diffusion model to perform novel view synthesis conditioned on a single image.
We further train the model on target data with detection supervision.
arXiv Detail & Related papers (2023-11-07T23:46:41Z) - CMR3D: Contextualized Multi-Stage Refinement for 3D Object Detection [57.44434974289945]
We propose Contextualized Multi-Stage Refinement for 3D Object Detection (CMR3D) framework.
Our framework takes a 3D scene as input and strives to explicitly integrate useful contextual information of the scene.
In addition to 3D object detection, we investigate the effectiveness of our framework for the problem of 3D object counting.
arXiv Detail & Related papers (2022-09-13T05:26:09Z) - 3D Random Occlusion and Multi-Layer Projection for Deep Multi-Camera
Pedestrian Localization [6.929027496437192]
The proposed 3DROM method has a greatly improved performance in comparison with the state-of-the-art deep-learning based methods for multi-view pedestrian detection.
arXiv Detail & Related papers (2022-07-22T06:15:20Z) - Neural Volumetric Object Selection [126.04480613166194]
We introduce an approach for selecting objects in neural volumetric 3D representations, such as multi-plane images (MPI) and neural radiance fields (NeRF)
Our approach takes a set of foreground and background 2D user scribbles in one view and automatically estimates a 3D segmentation of the desired object, which can be rendered into novel views.
arXiv Detail & Related papers (2022-05-30T08:55:20Z) - Learnable Online Graph Representations for 3D Multi-Object Tracking [156.58876381318402]
We propose a unified and learning based approach to the 3D MOT problem.
We employ a Neural Message Passing network for data association that is fully trainable.
We show the merit of the proposed approach on the publicly available nuScenes dataset by achieving state-of-the-art performance of 65.6% AMOTA and 58% fewer ID-switches.
arXiv Detail & Related papers (2021-04-23T17:59:28Z) - Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image.
Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space.
We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step.
This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.