Recognition and 3D Localization of Pedestrian Actions from Monocular
Video
- URL: http://arxiv.org/abs/2008.01162v1
- Date: Mon, 3 Aug 2020 19:57:03 GMT
- Title: Recognition and 3D Localization of Pedestrian Actions from Monocular
Video
- Authors: Jun Hayakawa, Behzad Dariush
- Abstract summary: This paper focuses on monocular pedestrian action recognition and 3D localization from an egocentric view.
A challenge in addressing this problem in urban traffic scenes is attributed to the unpredictable behavior of pedestrians.
- Score: 11.29865843123467
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Understanding and predicting pedestrian behavior is an important and
challenging area of research for realizing safe and effective navigation
strategies in automated and advanced driver assistance technologies in urban
scenes. This paper focuses on monocular pedestrian action recognition and 3D
localization from an egocentric view for the purpose of predicting intention
and forecasting future trajectory. A challenge in addressing this problem in
urban traffic scenes is attributed to the unpredictable behavior of
pedestrians, whereby actions and intentions are constantly in flux and depend
on the pedestrians pose, their 3D spatial relations, and their interaction with
other agents as well as with the environment. To partially address these
challenges, we consider the importance of pose toward recognition and 3D
localization of pedestrian actions. In particular, we propose an action
recognition framework using a two-stream temporal relation network with inputs
corresponding to the raw RGB image sequence of the tracked pedestrian as well
as the pedestrian pose. The proposed method outperforms methods using a
single-stream temporal relation network based on evaluations using the JAAD
public dataset. The estimated pose and associated body key-points are also used
as input to a network that estimates the 3D location of the pedestrian using a
unique loss function. The evaluation of our 3D localization method on the KITTI
dataset indicates the improvement of the average localization error as compared
to existing state-of-the-art methods. Finally, we conduct qualitative tests of
action recognition and 3D localization on HRI's H3D driving dataset.
Related papers
- Social-Transmotion: Promptable Human Trajectory Prediction [65.80068316170613]
Social-Transmotion is a generic Transformer-based model that exploits diverse and numerous visual cues to predict human behavior.
Our approach is validated on multiple datasets, including JTA, JRDB, Pedestrians and Cyclists in Road Traffic, and ETH-UCY.
arXiv Detail & Related papers (2023-12-26T18:56:49Z) - ALSTER: A Local Spatio-Temporal Expert for Online 3D Semantic
Reconstruction [62.599588577671796]
We propose an online 3D semantic segmentation method that incrementally reconstructs a 3D semantic map from a stream of RGB-D frames.
Unlike offline methods, ours is directly applicable to scenarios with real-time constraints, such as robotics or mixed reality.
arXiv Detail & Related papers (2023-11-29T20:30:18Z) - Pedestrian Crossing Action Recognition and Trajectory Prediction with 3D
Human Keypoints [25.550524178542833]
We propose a novel multi-task learning framework for pedestrian crossing action recognition and trajectory prediction.
We use 3D human keypoints extracted from raw sensor data to capture rich information on human pose and activity.
We show that our approach achieves state-of-the-art performance on a wide range of evaluation metrics.
arXiv Detail & Related papers (2023-06-01T18:27:48Z) - LocATe: End-to-end Localization of Actions in 3D with Transformers [91.28982770522329]
LocATe is an end-to-end approach that jointly localizes and recognizes actions in a 3D sequence.
Unlike transformer-based object-detection and classification models which consider image or patch features as input, LocATe's transformer model is capable of capturing long-term correlations between actions in a sequence.
We introduce a new, challenging, and more realistic benchmark dataset, BABEL-TAL-20 (BT20), where the performance of state-of-the-art methods is significantly worse.
arXiv Detail & Related papers (2022-03-21T03:35:32Z) - Learnable Online Graph Representations for 3D Multi-Object Tracking [156.58876381318402]
We propose a unified and learning based approach to the 3D MOT problem.
We employ a Neural Message Passing network for data association that is fully trainable.
We show the merit of the proposed approach on the publicly available nuScenes dataset by achieving state-of-the-art performance of 65.6% AMOTA and 58% fewer ID-switches.
arXiv Detail & Related papers (2021-04-23T17:59:28Z) - Safety-Oriented Pedestrian Motion and Scene Occupancy Forecasting [91.69900691029908]
We advocate for predicting both the individual motions as well as the scene occupancy map.
We propose a Scene-Actor Graph Neural Network (SA-GNN) which preserves the relative spatial information of pedestrians.
On two large-scale real-world datasets, we showcase that our scene-occupancy predictions are more accurate and better calibrated than those from state-of-the-art motion forecasting methods.
arXiv Detail & Related papers (2021-01-07T06:08:21Z) - Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data
Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images.
Our approach is fully automatic without any human interaction.
We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z) - PePScenes: A Novel Dataset and Baseline for Pedestrian Action Prediction
in 3D [10.580548257913843]
We propose a new pedestrian action prediction dataset created by adding per-frame 2D/3D bounding box and behavioral annotations to nuScenes.
In addition, we propose a hybrid neural network architecture that incorporates various data modalities for predicting pedestrian crossing action.
arXiv Detail & Related papers (2020-12-14T18:13:44Z) - Graph-SIM: A Graph-based Spatiotemporal Interaction Modelling for
Pedestrian Action Prediction [10.580548257913843]
We propose a novel graph-based model for predicting pedestrian crossing action.
We introduce a new dataset that provides 3D bounding box and pedestrian behavioural annotations for the existing nuScenes dataset.
Our approach achieves state-of-the-art performance by improving on various metrics by more than 15% in comparison to existing methods.
arXiv Detail & Related papers (2020-12-03T18:28:27Z) - A Real-Time Predictive Pedestrian Collision Warning Service for
Cooperative Intelligent Transportation Systems Using 3D Pose Estimation [10.652350454373531]
We propose a real-time predictive pedestrian collision warning service (P2CWS) for two tasks: pedestrian orientation recognition (100.53 FPS) and intention prediction (35.76 FPS)
Our framework obtains satisfying generalization over multiple sites because of the proposed site-independent features.
The proposed vision framework realizes 89.3% accuracy in the behavior recognition task on the TUD dataset without any training process.
arXiv Detail & Related papers (2020-09-23T00:55:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.