Review of Video Predictive Understanding: Early ActionRecognition and
Future Action Prediction
- URL: http://arxiv.org/abs/2107.05140v1
- Date: Sun, 11 Jul 2021 22:46:52 GMT
- Title: Review of Video Predictive Understanding: Early ActionRecognition and
Future Action Prediction
- Authors: He Zhao, Richard P. Wildes
- Abstract summary: Action prediction is a major sub-area of video predictive understanding.
Various mathematical tools are widely adopted jointly with computer vision techniques for these two tasks.
Structures that rely on deep convolutional neural networks and recurrent neural networks have been extensively proposed for improving the performance of existing vision tasks.
- Score: 39.966828592322315
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Video predictive understanding encompasses a wide range of efforts that are
concerned with the anticipation of the unobserved future from the current as
well as historical video observations. Action prediction is a major sub-area of
video predictive understanding and is the focus of this review. This sub-area
has two major subdivisions: early action recognition and future action
prediction. Early action recognition is concerned with recognizing an ongoing
action as soon as possible. Future action prediction is concerned with the
anticipation of actions that follow those previously observed. In either case,
the \textbf{\textit{causal}} relationship between the past, current, and
potential future information is the main focus. Various mathematical tools such
as Markov Chains, Gaussian Processes, Auto-Regressive modeling, and Bayesian
recursive filtering are widely adopted jointly with computer vision techniques
for these two tasks. However, these approaches face challenges such as the
curse of dimensionality, poor generalization, and constraints from
domain-specific knowledge. Recently, structures that rely on deep convolutional
neural networks and recurrent neural networks have been extensively proposed
for improving the performance of existing vision tasks, in general, and action
prediction tasks, in particular. However, they have their own shortcomings, \eg
reliance on massive training data and lack of strong theoretical underpinnings.
In this survey, we start by introducing the major sub-areas of the broad area
of video predictive understanding, which recently have received intensive
attention and proven to have practical value. Next, a thorough review of
various early action recognition and future action prediction algorithms are
provided with suitably organized divisions. Finally, we conclude our discussion
with future research directions.
Related papers
- About Time: Advances, Challenges, and Outlooks of Action Understanding [57.76390141287026]
This survey comprehensively reviews advances in uni- and multi-modal action understanding across a range of tasks.
We focus on prevalent challenges, overview widely adopted datasets, and survey seminal works with an emphasis on recent advances.
arXiv Detail & Related papers (2024-11-22T18:09:27Z) - From Recognition to Prediction: Leveraging Sequence Reasoning for Action Anticipation [30.161471749050833]
We propose a novel end-to-end video modeling architecture that utilizes attention mechanisms, named Anticipation via Recognition and Reasoning (ARR)
ARR decomposes the action anticipation task into action recognition and reasoning tasks, and effectively learns the statistical relationship between actions by next action prediction (NAP)
In addition, to address the challenge of relationship modeling that requires extensive training data, we propose an innovative approach for the unsupervised pre-training of the decoder.
arXiv Detail & Related papers (2024-08-05T18:38:29Z) - A Discussion on Generalization in Next-Activity Prediction [1.2289361708127877]
We show that there is an enormous amount of example leakage in all of the commonly used event logs.
We argue that designing robust evaluations requires a more profound conceptual engagement with the topic of next-activity prediction.
arXiv Detail & Related papers (2023-09-18T09:42:36Z) - Implicit Occupancy Flow Fields for Perception and Prediction in
Self-Driving [68.95178518732965]
A self-driving vehicle (SDV) must be able to perceive its surroundings and predict the future behavior of other traffic participants.
Existing works either perform object detection followed by trajectory of the detected objects, or predict dense occupancy and flow grids for the whole scene.
This motivates our unified approach to perception and future prediction that implicitly represents occupancy and flow over time with a single neural network.
arXiv Detail & Related papers (2023-08-02T23:39:24Z) - Inductive Attention for Video Action Anticipation [16.240254363118016]
We propose an inductive attention model, dubbed IAM, which leverages the current prior predictions as the query to infer future action.
Our method consistently outperforms the state-of-the-art anticipation models on multiple large-scale egocentric video datasets.
arXiv Detail & Related papers (2022-12-17T09:51:17Z) - Finding Islands of Predictability in Action Forecasting [7.215559809521136]
We show that future action sequences are more accurately modeled with variable, rather than one, levels of abstraction.
We propose a combination Bayesian neural network and hierarchical convolutional segmentation model to both accurately predict future actions and optimally select abstraction levels.
arXiv Detail & Related papers (2022-10-13T21:01:16Z) - A-ACT: Action Anticipation through Cycle Transformations [89.83027919085289]
We take a step back to analyze how the human capability to anticipate the future can be transferred to machine learning algorithms.
A recent study on human psychology explains that, in anticipating an occurrence, the human brain counts on both systems.
In this work, we study the impact of each system for the task of action anticipation and introduce a paradigm to integrate them in a learning framework.
arXiv Detail & Related papers (2022-04-02T21:50:45Z) - Investigating Pose Representations and Motion Contexts Modeling for 3D
Motion Prediction [63.62263239934777]
We conduct an indepth study on various pose representations with a focus on their effects on the motion prediction task.
We propose a novel RNN architecture termed AHMR (Attentive Hierarchical Motion Recurrent network) for motion prediction.
Our approach outperforms the state-of-the-art methods in short-term prediction and achieves much enhanced long-term prediction proficiency.
arXiv Detail & Related papers (2021-12-30T10:45:22Z) - Long-Term Anticipation of Activities with Cycle Consistency [90.79357258104417]
We propose a framework for anticipating future activities directly from the features of the observed frames and train it in an end-to-end fashion.
Our framework achieves state-the-art results on two datasets: the Breakfast dataset and 50Salads.
arXiv Detail & Related papers (2020-09-02T15:41:32Z) - TTPP: Temporal Transformer with Progressive Prediction for Efficient
Action Anticipation [46.28067541184604]
Video action anticipation aims to predict future action categories from observed frames.
Current state-of-the-art approaches mainly resort to recurrent neural networks to encode history information into hidden states.
This paper proposes a simple yet efficient Temporal Transformer with Progressive Prediction framework.
arXiv Detail & Related papers (2020-03-07T07:59:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.