InCrowdFormer: On-Ground Pedestrian World Model From Egocentric Views
- URL: http://arxiv.org/abs/2303.09534v1
- Date: Thu, 16 Mar 2023 17:51:02 GMT
- Title: InCrowdFormer: On-Ground Pedestrian World Model From Egocentric Views
- Authors: Mai Nishimura, Shohei Nobuhara, Ko Nishino
- Abstract summary: We introduce an on-ground Pedestrian World Model that can predict how pedestrians move around an observer in the crowd on the ground plane.
InCrowdFormer fully leverages the Transformer architecture by modeling pedestrian interaction and egocentric to top-down view transformation with attention.
We encode the uncertainties arising from unknown pedestrian heights with latent codes to predict the posterior distributions of pedestrian positions.
- Score: 28.54213112712818
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce an on-ground Pedestrian World Model, a computational model that
can predict how pedestrians move around an observer in the crowd on the ground
plane, but from just the egocentric-views of the observer. Our model,
InCrowdFormer, fully leverages the Transformer architecture by modeling
pedestrian interaction and egocentric to top-down view transformation with
attention, and autoregressively predicts on-ground positions of a variable
number of people with an encoder-decoder architecture. We encode the
uncertainties arising from unknown pedestrian heights with latent codes to
predict the posterior distributions of pedestrian positions. We validate the
effectiveness of InCrowdFormer on a novel prediction benchmark of real
movements. The results show that InCrowdFormer accurately predicts the future
coordination of pedestrians. To the best of our knowledge, InCrowdFormer is the
first-of-its-kind pedestrian world model which we believe will benefit a wide
range of egocentric-view applications including crowd navigation, tracking, and
synthesis.
Related papers
- Humanoid Locomotion as Next Token Prediction [84.21335675130021]
Our model is a causal transformer trained via autoregressive prediction of sensorimotor trajectories.
We show that our model enables a full-sized humanoid to walk in San Francisco zero-shot.
Our model can transfer to the real world even when trained on only 27 hours of walking data, and can generalize commands not seen during training like walking backward.
arXiv Detail & Related papers (2024-02-29T18:57:37Z) - Social-Transmotion: Promptable Human Trajectory Prediction [65.80068316170613]
Social-Transmotion is a generic Transformer-based model that exploits diverse and numerous visual cues to predict human behavior.
Our approach is validated on multiple datasets, including JTA, JRDB, Pedestrians and Cyclists in Road Traffic, and ETH-UCY.
arXiv Detail & Related papers (2023-12-26T18:56:49Z) - PedFormer: Pedestrian Behavior Prediction via Cross-Modal Attention
Modulation and Gated Multitask Learning [10.812772606528172]
We propose a novel framework that relies on different data modalities to predict future trajectories and crossing actions of pedestrians from an ego-centric perspective.
We show that our model improves state-of-the-art in trajectory and action prediction by up to 22% and 13% respectively on various metrics.
arXiv Detail & Related papers (2022-10-14T15:12:00Z) - T2FPV: Constructing High-Fidelity First-Person View Datasets From
Real-World Pedestrian Trajectories [9.44806128120871]
We present T2FPV, a method for constructing high-fidelity first-person view datasets given a real-world, top-down trajectory dataset.
We showcase our approach on the ETH/UCY pedestrian dataset to generate the egocentric visual data of all interacting pedestrians.
arXiv Detail & Related papers (2022-09-22T20:14:43Z) - Conditioned Human Trajectory Prediction using Iterative Attention Blocks [70.36888514074022]
We present a simple yet effective pedestrian trajectory prediction model aimed at pedestrians positions prediction in urban-like environments.
Our model is a neural-based architecture that can run several layers of attention blocks and transformers in an iterative sequential fashion.
We show that without explicit introduction of social masks, dynamical models, social pooling layers, or complicated graph-like structures, it is possible to produce on par results with SoTA models.
arXiv Detail & Related papers (2022-06-29T07:49:48Z) - Pedestrian 3D Bounding Box Prediction [83.7135926821794]
We focus on 3D bounding boxes, which are reasonable estimates of humans without modeling complex motion details for autonomous vehicles.
We suggest this new problem and present a simple yet effective model for pedestrians' 3D bounding box prediction.
This method follows an encoder-decoder architecture based on recurrent neural networks.
arXiv Detail & Related papers (2022-06-28T17:59:45Z) - Pedestrian Stop and Go Forecasting with Hybrid Feature Fusion [87.77727495366702]
We introduce the new task of pedestrian stop and go forecasting.
Considering the lack of suitable existing datasets for it, we release TRANS, a benchmark for explicitly studying the stop and go behaviors of pedestrians in urban traffic.
We build it from several existing datasets annotated with pedestrians' walking motions, in order to have various scenarios and behaviors.
arXiv Detail & Related papers (2022-03-04T18:39:31Z) - Learning Sparse Interaction Graphs of Partially Observed Pedestrians for
Trajectory Prediction [0.3025231207150811]
Multi-pedestrian trajectory prediction is an indispensable safety element of autonomous systems that interact with crowds in unstructured environments.
We propose Gumbel Social Transformer, in which an Edge Gumbel Selector samples a sparse graph of partially observed pedestrians at each time step.
We demonstrate that our model overcomes the potential problems caused by the assumptions, and our approach outperforms the related works in benchmark evaluation.
arXiv Detail & Related papers (2021-07-15T00:45:11Z) - Pedestrian Intention Prediction: A Multi-task Perspective [83.7135926821794]
In order to be globally deployed, autonomous cars must guarantee the safety of pedestrians.
This work tries to solve this problem by jointly predicting the intention and visual states of pedestrians.
The method is a recurrent neural network in a multi-task learning approach.
arXiv Detail & Related papers (2020-10-20T13:42:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.