From Recognition to Prediction: Analysis of Human Action and Trajectory
Prediction in Video
- URL: http://arxiv.org/abs/2011.10670v3
- Date: Fri, 16 Jul 2021 13:45:43 GMT
- Title: From Recognition to Prediction: Analysis of Human Action and Trajectory
Prediction in Video
- Authors: Junwei Liang
- Abstract summary: Deciphering human behaviors to predict their future paths/trajectories is important.
Human trajectory prediction still remains a challenging task.
System must be able to detect and analyze human activities as well as scene semantics.
- Score: 4.163207534602723
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the advancement in computer vision deep learning, systems now are able
to analyze an unprecedented amount of rich visual information from videos to
enable applications such as autonomous driving, socially-aware robot assistant
and public safety monitoring. Deciphering human behaviors to predict their
future paths/trajectories and what they would do from videos is important in
these applications. However, human trajectory prediction still remains a
challenging task, as scene semantics and human intent are difficult to model.
Many systems do not provide high-level semantic attributes to reason about
pedestrian future. This design hinders prediction performance in video data
from diverse domains and unseen scenarios. To enable optimal future human
behavioral forecasting, it is crucial for the system to be able to detect and
analyze human activities as well as scene semantics, passing informative
features to the subsequent prediction module for context understanding.
Related papers
- Predicting Long-horizon Futures by Conditioning on Geometry and Time [49.86180975196375]
We explore the task of generating future sensor observations conditioned on the past.
We leverage the large-scale pretraining of image diffusion models which can handle multi-modality.
We create a benchmark for video prediction on a diverse set of videos spanning indoor and outdoor scenes.
arXiv Detail & Related papers (2024-04-17T16:56:31Z) - Social-Transmotion: Promptable Human Trajectory Prediction [65.80068316170613]
Social-Transmotion is a generic Transformer-based model that exploits diverse and numerous visual cues to predict human behavior.
Our approach is validated on multiple datasets, including JTA, JRDB, Pedestrians and Cyclists in Road Traffic, and ETH-UCY.
arXiv Detail & Related papers (2023-12-26T18:56:49Z) - Robots That Can See: Leveraging Human Pose for Trajectory Prediction [30.919756497223343]
We present a Transformer based architecture to predict human future trajectories in human-centric environments.
The resulting model captures the inherent uncertainty for future human trajectory prediction.
We identify new agents with limited historical data as a major contributor to error and demonstrate the complementary nature of 3D skeletal poses in reducing prediction error.
arXiv Detail & Related papers (2023-09-29T13:02:56Z) - Interpretable Self-Aware Neural Networks for Robust Trajectory
Prediction [50.79827516897913]
We introduce an interpretable paradigm for trajectory prediction that distributes the uncertainty among semantic concepts.
We validate our approach on real-world autonomous driving data, demonstrating superior performance over state-of-the-art baselines.
arXiv Detail & Related papers (2022-11-16T06:28:20Z) - GIMO: Gaze-Informed Human Motion Prediction in Context [75.52839760700833]
We propose a large-scale human motion dataset that delivers high-quality body pose sequences, scene scans, and ego-centric views with eye gaze.
Our data collection is not tied to specific scenes, which further boosts the motion dynamics observed from our subjects.
To realize the full potential of gaze, we propose a novel network architecture that enables bidirectional communication between the gaze and motion branches.
arXiv Detail & Related papers (2022-04-20T13:17:39Z) - A-ACT: Action Anticipation through Cycle Transformations [89.83027919085289]
We take a step back to analyze how the human capability to anticipate the future can be transferred to machine learning algorithms.
A recent study on human psychology explains that, in anticipating an occurrence, the human brain counts on both systems.
In this work, we study the impact of each system for the task of action anticipation and introduce a paradigm to integrate them in a learning framework.
arXiv Detail & Related papers (2022-04-02T21:50:45Z) - A Framework for Multisensory Foresight for Embodied Agents [11.351546861334292]
Predicting future sensory states is crucial for learning agents such as robots, drones, and autonomous vehicles.
In this paper, we couple multiple sensory modalities with exploratory actions and propose a predictive neural network architecture to address this problem.
The framework was tested and validated with a dataset containing 4 sensory modalities (vision, haptic, audio, and tactile) on a humanoid robot performing 9 behaviors multiple times on a large set of objects.
arXiv Detail & Related papers (2021-09-15T20:20:04Z) - Predicting the Future from First Person (Egocentric) Vision: A Survey [18.07516837332113]
This survey summarises the evolution of studies in the context of future prediction from egocentric vision.
It makes an overview of applications, devices, existing problems, commonly used datasets, models and input modalities.
Our analysis highlights that methods for future prediction from egocentric vision can have a significant impact in a range of applications.
arXiv Detail & Related papers (2021-07-28T14:58:13Z) - Future Frame Prediction for Robot-assisted Surgery [57.18185972461453]
We propose a ternary prior guided variational autoencoder (TPG-VAE) model for future frame prediction in robotic surgical video sequences.
Besides content distribution, our model learns motion distribution, which is novel to handle the small movements of surgical tools.
arXiv Detail & Related papers (2021-03-18T15:12:06Z) - VRUNet: Multi-Task Learning Model for Intent Prediction of Vulnerable
Road Users [3.6265173818019947]
We propose a multi-task learning model to predict pedestrian actions, crossing intent and forecast their future path from video sequences.
We have trained the model on naturalistic driving open-source JAAD dataset, which is rich in behavioral annotations and real world scenarios.
Experimental results show state-of-the-art performance on JAAD dataset.
arXiv Detail & Related papers (2020-07-10T14:02:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.