GPT-4V Takes the Wheel: Promises and Challenges for Pedestrian Behavior
Prediction
- URL: http://arxiv.org/abs/2311.14786v2
- Date: Thu, 25 Jan 2024 20:55:16 GMT
- Title: GPT-4V Takes the Wheel: Promises and Challenges for Pedestrian Behavior
Prediction
- Authors: Jia Huang, Peng Jiang, Alvika Gautam, and Srikanth Saripalli
- Abstract summary: This research is the first to conduct both quantitative and qualitative evaluations of Vision Language Models (VLMs) in the context of pedestrian behavior prediction for autonomous driving.
We evaluate GPT-4V on publicly available pedestrian datasets: JAAD and WiDEVIEW.
The model achieves a 57% accuracy in a zero-shot manner, which, while impressive, is still behind the state-of-the-art domain-specific models (70%) in predicting pedestrian crossing actions.
- Score: 12.613528624623514
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Predicting pedestrian behavior is the key to ensure safety and reliability of
autonomous vehicles. While deep learning methods have been promising by
learning from annotated video frame sequences, they often fail to fully grasp
the dynamic interactions between pedestrians and traffic, crucial for accurate
predictions. These models also lack nuanced common sense reasoning. Moreover,
the manual annotation of datasets for these models is expensive and challenging
to adapt to new situations. The advent of Vision Language Models (VLMs)
introduces promising alternatives to these issues, thanks to their advanced
visual and causal reasoning skills. To our knowledge, this research is the
first to conduct both quantitative and qualitative evaluations of VLMs in the
context of pedestrian behavior prediction for autonomous driving. We evaluate
GPT-4V(ision) on publicly available pedestrian datasets: JAAD and WiDEVIEW. Our
quantitative analysis focuses on GPT-4V's ability to predict pedestrian
behavior in current and future frames. The model achieves a 57% accuracy in a
zero-shot manner, which, while impressive, is still behind the state-of-the-art
domain-specific models (70%) in predicting pedestrian crossing actions.
Qualitatively, GPT-4V shows an impressive ability to process and interpret
complex traffic scenarios, differentiate between various pedestrian behaviors,
and detect and analyze groups. However, it faces challenges, such as difficulty
in detecting smaller pedestrians and assessing the relative motion between
pedestrians and the ego vehicle.
Related papers
- Analysis over vision-based models for pedestrian action anticipation [1.1470070927586016]
This paper focuses on using images of the pedestrian's context as an input feature.
We present several paper-model architectures that utilize standard CNN and Transformer modules.
We provide insights on the explainability of vision-based Transformer models in the context of pedestrian action prediction.
arXiv Detail & Related papers (2023-05-27T11:30:32Z) - Local and Global Contextual Features Fusion for Pedestrian Intention
Prediction [2.203209457340481]
We analyse and analyse visual features of both pedestrian and traffic contexts.
To understand the global context, we utilise location, motion, and environmental information.
These multi-modality features are intelligently fused for effective intention learning.
arXiv Detail & Related papers (2023-05-01T22:37:31Z) - Pedestrian Stop and Go Forecasting with Hybrid Feature Fusion [87.77727495366702]
We introduce the new task of pedestrian stop and go forecasting.
Considering the lack of suitable existing datasets for it, we release TRANS, a benchmark for explicitly studying the stop and go behaviors of pedestrians in urban traffic.
We build it from several existing datasets annotated with pedestrians' walking motions, in order to have various scenarios and behaviors.
arXiv Detail & Related papers (2022-03-04T18:39:31Z) - PSI: A Pedestrian Behavior Dataset for Socially Intelligent Autonomous
Car [47.01116716025731]
This paper proposes and shares another benchmark dataset called the IUPUI-CSRC Pedestrian Situated Intent (PSI) data.
The first novel label is the dynamic intent changes for the pedestrians to cross in front of the ego-vehicle, achieved from 24 drivers.
The second one is the text-based explanations of the driver reasoning process when estimating pedestrian intents and predicting their behaviors.
arXiv Detail & Related papers (2021-12-05T15:54:57Z) - Safety-Oriented Pedestrian Motion and Scene Occupancy Forecasting [91.69900691029908]
We advocate for predicting both the individual motions as well as the scene occupancy map.
We propose a Scene-Actor Graph Neural Network (SA-GNN) which preserves the relative spatial information of pedestrians.
On two large-scale real-world datasets, we showcase that our scene-occupancy predictions are more accurate and better calibrated than those from state-of-the-art motion forecasting methods.
arXiv Detail & Related papers (2021-01-07T06:08:21Z) - Pedestrian Behavior Prediction for Automated Driving: Requirements,
Metrics, and Relevant Features [1.1888947789336193]
We analyze the requirements on pedestrian behavior prediction for automated driving via a system-level approach.
Based on human driving behavior we derive appropriate reaction patterns of an automated vehicle.
We present a pedestrian prediction model based on a Variational Conditional Auto-Encoder which incorporates multiple contextual cues.
arXiv Detail & Related papers (2020-12-15T16:52:49Z) - Detecting 32 Pedestrian Attributes for Autonomous Vehicles [103.87351701138554]
In this paper, we address the problem of jointly detecting pedestrians and recognizing 32 pedestrian attributes.
We introduce a Multi-Task Learning (MTL) model relying on a composite field framework, which achieves both goals in an efficient way.
We show competitive detection and attribute recognition results, as well as a more stable MTL training.
arXiv Detail & Related papers (2020-12-04T15:10:12Z) - Pedestrian Intention Prediction: A Multi-task Perspective [83.7135926821794]
In order to be globally deployed, autonomous cars must guarantee the safety of pedestrians.
This work tries to solve this problem by jointly predicting the intention and visual states of pedestrians.
The method is a recurrent neural network in a multi-task learning approach.
arXiv Detail & Related papers (2020-10-20T13:42:31Z) - Pedestrian Models for Autonomous Driving Part II: High-Level Models of
Human Behavior [12.627716603026391]
Planning autonomous vehicles in the presence of pedestrians requires modelling of their probable future behaviour.
This survey clearly shows that, although there are good models for optimal walking behaviour, high-level psychological and social modelling of pedestrian behaviour still remains an open research question.
arXiv Detail & Related papers (2020-03-26T14:55:18Z) - Spatiotemporal Relationship Reasoning for Pedestrian Intent Prediction [57.56466850377598]
Reasoning over visual data is a desirable capability for robotics and vision-based applications.
In this paper, we present a framework on graph to uncover relationships in different objects in the scene for reasoning about pedestrian intent.
Pedestrian intent, defined as the future action of crossing or not-crossing the street, is a very crucial piece of information for autonomous vehicles.
arXiv Detail & Related papers (2020-02-20T18:50:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.