Pedestrian Crossing Action Recognition and Trajectory Prediction with 3D
Human Keypoints
- URL: http://arxiv.org/abs/2306.01075v1
- Date: Thu, 1 Jun 2023 18:27:48 GMT
- Title: Pedestrian Crossing Action Recognition and Trajectory Prediction with 3D
Human Keypoints
- Authors: Jiachen Li, Xinwei Shi, Feiyu Chen, Jonathan Stroud, Zhishuai Zhang,
Tian Lan, Junhua Mao, Jeonhyung Kang, Khaled S. Refaat, Weilong Yang, Eugene
Ie, Congcong Li
- Abstract summary: We propose a novel multi-task learning framework for pedestrian crossing action recognition and trajectory prediction.
We use 3D human keypoints extracted from raw sensor data to capture rich information on human pose and activity.
We show that our approach achieves state-of-the-art performance on a wide range of evaluation metrics.
- Score: 25.550524178542833
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Accurate understanding and prediction of human behaviors are critical
prerequisites for autonomous vehicles, especially in highly dynamic and
interactive scenarios such as intersections in dense urban areas. In this work,
we aim at identifying crossing pedestrians and predicting their future
trajectories. To achieve these goals, we not only need the context information
of road geometry and other traffic participants but also need fine-grained
information of the human pose, motion and activity, which can be inferred from
human keypoints. In this paper, we propose a novel multi-task learning
framework for pedestrian crossing action recognition and trajectory prediction,
which utilizes 3D human keypoints extracted from raw sensor data to capture
rich information on human pose and activity. Moreover, we propose to apply two
auxiliary tasks and contrastive learning to enable auxiliary supervisions to
improve the learned keypoints representation, which further enhances the
performance of major tasks. We validate our approach on a large-scale in-house
dataset, as well as a public benchmark dataset, and show that our approach
achieves state-of-the-art performance on a wide range of evaluation metrics.
The effectiveness of each model component is validated in a detailed ablation
study.
Related papers
- Social-Transmotion: Promptable Human Trajectory Prediction [65.80068316170613]
Social-Transmotion is a generic Transformer-based model that exploits diverse and numerous visual cues to predict human behavior.
Our approach is validated on multiple datasets, including JTA, JRDB, Pedestrians and Cyclists in Road Traffic, and ETH-UCY.
arXiv Detail & Related papers (2023-12-26T18:56:49Z) - Localizing Active Objects from Egocentric Vision with Symbolic World
Knowledge [62.981429762309226]
The ability to actively ground task instructions from an egocentric view is crucial for AI agents to accomplish tasks or assist humans virtually.
We propose to improve phrase grounding models' ability on localizing the active objects by: learning the role of objects undergoing change and extracting them accurately from the instructions.
We evaluate our framework on Ego4D and Epic-Kitchens datasets.
arXiv Detail & Related papers (2023-10-23T16:14:05Z) - SPOT: Scalable 3D Pre-training via Occupancy Prediction for Learning Transferable 3D Representations [76.45009891152178]
Pretraining-finetuning approach can alleviate the labeling burden by fine-tuning a pre-trained backbone across various downstream datasets as well as tasks.
We show, for the first time, that general representations learning can be achieved through the task of occupancy prediction.
Our findings will facilitate the understanding of LiDAR points and pave the way for future advancements in LiDAR pre-training.
arXiv Detail & Related papers (2023-09-19T11:13:01Z) - 2D Human Pose Estimation: A Survey [16.56050212383859]
Human pose estimation aims at localizing human anatomical keypoints or body parts in the input data.
Deep learning techniques allow learning feature representations directly from the data.
In this paper, we reap the recent achievements of 2D human pose estimation methods and present a comprehensive survey.
arXiv Detail & Related papers (2022-04-15T08:09:43Z) - Important Object Identification with Semi-Supervised Learning for
Autonomous Driving [37.654878298744855]
We propose a novel approach for important object identification in egocentric driving scenarios.
We present a semi-supervised learning pipeline to enable the model to learn from unlimited unlabeled data.
Our approach also outperforms rule-based baselines by a large margin.
arXiv Detail & Related papers (2022-03-05T01:23:13Z) - Learnable Online Graph Representations for 3D Multi-Object Tracking [156.58876381318402]
We propose a unified and learning based approach to the 3D MOT problem.
We employ a Neural Message Passing network for data association that is fully trainable.
We show the merit of the proposed approach on the publicly available nuScenes dataset by achieving state-of-the-art performance of 65.6% AMOTA and 58% fewer ID-switches.
arXiv Detail & Related papers (2021-04-23T17:59:28Z) - TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks.
To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame.
Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z) - Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data
Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images.
Our approach is fully automatic without any human interaction.
We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z) - Graph-SIM: A Graph-based Spatiotemporal Interaction Modelling for
Pedestrian Action Prediction [10.580548257913843]
We propose a novel graph-based model for predicting pedestrian crossing action.
We introduce a new dataset that provides 3D bounding box and pedestrian behavioural annotations for the existing nuScenes dataset.
Our approach achieves state-of-the-art performance by improving on various metrics by more than 15% in comparison to existing methods.
arXiv Detail & Related papers (2020-12-03T18:28:27Z) - Recognition and 3D Localization of Pedestrian Actions from Monocular
Video [11.29865843123467]
This paper focuses on monocular pedestrian action recognition and 3D localization from an egocentric view.
A challenge in addressing this problem in urban traffic scenes is attributed to the unpredictable behavior of pedestrians.
arXiv Detail & Related papers (2020-08-03T19:57:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.