Analysis over vision-based models for pedestrian action anticipation
- URL: http://arxiv.org/abs/2305.17451v1
- Date: Sat, 27 May 2023 11:30:32 GMT
- Title: Analysis over vision-based models for pedestrian action anticipation
- Authors: Lina Achaji, Julien Moreau, Fran\c{c}ois Aioun, Fran\c{c}ois
Charpillet
- Abstract summary: This paper focuses on using images of the pedestrian's context as an input feature.
We present several paper-model architectures that utilize standard CNN and Transformer modules.
We provide insights on the explainability of vision-based Transformer models in the context of pedestrian action prediction.
- Score: 1.1470070927586016
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Anticipating human actions in front of autonomous vehicles is a challenging
task. Several papers have recently proposed model architectures to address this
problem by combining multiple input features to predict pedestrian crossing
actions. This paper focuses specifically on using images of the pedestrian's
context as an input feature. We present several spatio-temporal model
architectures that utilize standard CNN and Transformer modules to serve as a
backbone for pedestrian anticipation. However, the objective of this paper is
not to surpass state-of-the-art benchmarks but rather to analyze the positive
and negative predictions of these models. Therefore, we provide insights on the
explainability of vision-based Transformer models in the context of pedestrian
action prediction. We will highlight cases where the model can achieve correct
quantitative results but falls short in providing human-like explanations
qualitatively, emphasizing the importance of investing in explainability for
pedestrian action anticipation problems.
Related papers
- Sparse Prototype Network for Explainable Pedestrian Behavior Prediction [60.80524827122901]
We present Sparse Prototype Network (SPN), an explainable method designed to simultaneously predict a pedestrian's future action, trajectory, and pose.
Regularized by mono-semanticity and clustering constraints, the prototypes learn consistent and human-understandable features.
arXiv Detail & Related papers (2024-10-16T03:33:40Z) - Feature Importance in Pedestrian Intention Prediction: A Context-Aware Review [9.475536008455133]
Recent advancements in predicting pedestrian crossing intentions for Autonomous Vehicles using Computer Vision and Deep Neural Networks are promising.
We introduce Context-aware Permutation Feature Importance (CAPFI), a novel approach tailored for pedestrian intention prediction.
CAPFI enables more interpretability and reliable assessments of feature importance by leveraging subdivided scenario contexts.
arXiv Detail & Related papers (2024-09-11T22:13:01Z) - GPT-4V Takes the Wheel: Promises and Challenges for Pedestrian Behavior
Prediction [12.613528624623514]
This research is the first to conduct both quantitative and qualitative evaluations of Vision Language Models (VLMs) in the context of pedestrian behavior prediction for autonomous driving.
We evaluate GPT-4V on publicly available pedestrian datasets: JAAD and WiDEVIEW.
The model achieves a 57% accuracy in a zero-shot manner, which, while impressive, is still behind the state-of-the-art domain-specific models (70%) in predicting pedestrian crossing actions.
arXiv Detail & Related papers (2023-11-24T18:02:49Z) - JRDB-Traj: A Dataset and Benchmark for Trajectory Forecasting in Crowds [79.00975648564483]
Trajectory forecasting models, employed in fields such as robotics, autonomous vehicles, and navigation, face challenges in real-world scenarios.
This dataset provides comprehensive data, including the locations of all agents, scene images, and point clouds, all from the robot's perspective.
The objective is to predict the future positions of agents relative to the robot using raw sensory input data.
arXiv Detail & Related papers (2023-11-05T18:59:31Z) - PedFormer: Pedestrian Behavior Prediction via Cross-Modal Attention
Modulation and Gated Multitask Learning [10.812772606528172]
We propose a novel framework that relies on different data modalities to predict future trajectories and crossing actions of pedestrians from an ego-centric perspective.
We show that our model improves state-of-the-art in trajectory and action prediction by up to 22% and 13% respectively on various metrics.
arXiv Detail & Related papers (2022-10-14T15:12:00Z) - Pedestrian Stop and Go Forecasting with Hybrid Feature Fusion [87.77727495366702]
We introduce the new task of pedestrian stop and go forecasting.
Considering the lack of suitable existing datasets for it, we release TRANS, a benchmark for explicitly studying the stop and go behaviors of pedestrians in urban traffic.
We build it from several existing datasets annotated with pedestrians' walking motions, in order to have various scenarios and behaviors.
arXiv Detail & Related papers (2022-03-04T18:39:31Z) - You Mostly Walk Alone: Analyzing Feature Attribution in Trajectory
Prediction [52.442129609979794]
Recent deep learning approaches for trajectory prediction show promising performance.
It remains unclear which features such black-box models actually learn to use for making predictions.
This paper proposes a procedure that quantifies the contributions of different cues to model performance.
arXiv Detail & Related papers (2021-10-11T14:24:15Z) - Learning Sparse Interaction Graphs of Partially Observed Pedestrians for
Trajectory Prediction [0.3025231207150811]
Multi-pedestrian trajectory prediction is an indispensable safety element of autonomous systems that interact with crowds in unstructured environments.
We propose Gumbel Social Transformer, in which an Edge Gumbel Selector samples a sparse graph of partially observed pedestrians at each time step.
We demonstrate that our model overcomes the potential problems caused by the assumptions, and our approach outperforms the related works in benchmark evaluation.
arXiv Detail & Related papers (2021-07-15T00:45:11Z) - FitVid: Overfitting in Pixel-Level Video Prediction [117.59339756506142]
We introduce a new architecture, named FitVid, which is capable of severe overfitting on the common benchmarks.
FitVid outperforms the current state-of-the-art models across four different video prediction benchmarks on four different metrics.
arXiv Detail & Related papers (2021-06-24T17:20:21Z) - Multi-Modal Hybrid Architecture for Pedestrian Action Prediction [14.032334569498968]
We propose a novel multi-modal prediction algorithm that incorporates different sources of information captured from the environment to predict future crossing actions of pedestrians.
Using the existing 2D pedestrian behavior benchmarks and a newly annotated 3D driving dataset, we show that our proposed model achieves state-of-the-art performance in pedestrian crossing prediction.
arXiv Detail & Related papers (2020-11-16T15:17:58Z) - Spatiotemporal Relationship Reasoning for Pedestrian Intent Prediction [57.56466850377598]
Reasoning over visual data is a desirable capability for robotics and vision-based applications.
In this paper, we present a framework on graph to uncover relationships in different objects in the scene for reasoning about pedestrian intent.
Pedestrian intent, defined as the future action of crossing or not-crossing the street, is a very crucial piece of information for autonomous vehicles.
arXiv Detail & Related papers (2020-02-20T18:50:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.