Robots That Can See: Leveraging Human Pose for Trajectory Prediction
- URL: http://arxiv.org/abs/2309.17209v1
- Date: Fri, 29 Sep 2023 13:02:56 GMT
- Title: Robots That Can See: Leveraging Human Pose for Trajectory Prediction
- Authors: Tim Salzmann, Lewis Chiang, Markus Ryll, Dorsa Sadigh, Carolina Parada
and Alex Bewley
- Abstract summary: We present a Transformer based architecture to predict human future trajectories in human-centric environments.
The resulting model captures the inherent uncertainty for future human trajectory prediction.
We identify new agents with limited historical data as a major contributor to error and demonstrate the complementary nature of 3D skeletal poses in reducing prediction error.
- Score: 30.919756497223343
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Anticipating the motion of all humans in dynamic environments such as homes
and offices is critical to enable safe and effective robot navigation. Such
spaces remain challenging as humans do not follow strict rules of motion and
there are often multiple occluded entry points such as corners and doors that
create opportunities for sudden encounters. In this work, we present a
Transformer based architecture to predict human future trajectories in
human-centric environments from input features including human positions, head
orientations, and 3D skeletal keypoints from onboard in-the-wild sensory
information. The resulting model captures the inherent uncertainty for future
human trajectory prediction and achieves state-of-the-art performance on common
prediction benchmarks and a human tracking dataset captured from a mobile robot
adapted for the prediction task. Furthermore, we identify new agents with
limited historical data as a major contributor to error and demonstrate the
complementary nature of 3D skeletal poses in reducing prediction error in such
challenging scenarios.
Related papers
- CoNav: A Benchmark for Human-Centered Collaborative Navigation [66.6268966718022]
We propose a collaborative navigation (CoNav) benchmark.
Our CoNav tackles the critical challenge of constructing a 3D navigation environment with realistic and diverse human activities.
We propose an intention-aware agent for reasoning both long-term and short-term human intention.
arXiv Detail & Related papers (2024-06-04T15:44:25Z) - Multimodal Sense-Informed Prediction of 3D Human Motions [16.71099574742631]
This work introduces a novel multi-modal sense-informed motion prediction approach, which conditions high-fidelity generation on two modal information.
The gaze information is regarded as the human intention, and combined with both motion and scene features, we construct a ternary intention-aware attention to supervise the generation.
On two real-world benchmarks, the proposed method achieves state-of-the-art performance both in 3D human pose and trajectory prediction.
arXiv Detail & Related papers (2024-05-05T12:38:10Z) - Exploring 3D Human Pose Estimation and Forecasting from the Robot's Perspective: The HARPER Dataset [52.22758311559]
We introduce HARPER, a novel dataset for 3D body pose estimation and forecast in dyadic interactions between users and Spot.
The key-novelty is the focus on the robot's perspective, i.e., on the data captured by the robot's sensors.
The scenario underlying HARPER includes 15 actions, of which 10 involve physical contact between the robot and users.
arXiv Detail & Related papers (2024-03-21T14:53:50Z) - Social-Transmotion: Promptable Human Trajectory Prediction [65.80068316170613]
Social-Transmotion is a generic Transformer-based model that exploits diverse and numerous visual cues to predict human behavior.
Our approach is validated on multiple datasets, including JTA, JRDB, Pedestrians and Cyclists in Road Traffic, and ETH-UCY.
arXiv Detail & Related papers (2023-12-26T18:56:49Z) - DMMGAN: Diverse Multi Motion Prediction of 3D Human Joints using
Attention-Based Generative Adverserial Network [9.247294820004143]
We propose a transformer-based generative model for forecasting multiple diverse human motions.
Our model first predicts the pose of the body relative to the hip joint. Then the textitHip Prediction Module predicts the trajectory of the hip movement for each predicted pose frame.
We show that our system outperforms the state-of-the-art in human motion prediction while it can predict diverse multi-motion future trajectories with hip movements.
arXiv Detail & Related papers (2022-09-13T23:22:33Z) - GIMO: Gaze-Informed Human Motion Prediction in Context [75.52839760700833]
We propose a large-scale human motion dataset that delivers high-quality body pose sequences, scene scans, and ego-centric views with eye gaze.
Our data collection is not tied to specific scenes, which further boosts the motion dynamics observed from our subjects.
To realize the full potential of gaze, we propose a novel network architecture that enables bidirectional communication between the gaze and motion branches.
arXiv Detail & Related papers (2022-04-20T13:17:39Z) - Egocentric Human Trajectory Forecasting with a Wearable Camera and
Multi-Modal Fusion [24.149925005674145]
We address the problem of forecasting the trajectory of an egocentric camera wearer (ego-person) in crowded spaces.
The trajectory forecasting ability learned from the data of different camera wearers can be transferred to assist visually impaired people in navigation.
A Transformer-based encoder-decoder neural network model, integrated with a novel cascaded cross-attention mechanism has been designed to predict the future trajectory of the camera wearer.
arXiv Detail & Related papers (2021-11-01T14:58:05Z) - Probabilistic Human Motion Prediction via A Bayesian Neural Network [71.16277790708529]
We propose a probabilistic model for human motion prediction in this paper.
Our model could generate several future motions when given an observed motion sequence.
We extensively validate our approach on a large scale benchmark dataset Human3.6m.
arXiv Detail & Related papers (2021-07-14T09:05:33Z) - TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks.
To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame.
Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z) - From Recognition to Prediction: Analysis of Human Action and Trajectory
Prediction in Video [4.163207534602723]
Deciphering human behaviors to predict their future paths/trajectories is important.
Human trajectory prediction still remains a challenging task.
System must be able to detect and analyze human activities as well as scene semantics.
arXiv Detail & Related papers (2020-11-20T22:23:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.