Inductive Attention for Video Action Anticipation
- URL: http://arxiv.org/abs/2212.08830v2
- Date: Sat, 18 Mar 2023 04:48:38 GMT
- Title: Inductive Attention for Video Action Anticipation
- Authors: Tsung-Ming Tai, Giuseppe Fiameni, Cheng-Kuang Lee, Simon See, Oswald
Lanz
- Abstract summary: We propose an inductive attention model, dubbed IAM, which leverages the current prior predictions as the query to infer future action.
Our method consistently outperforms the state-of-the-art anticipation models on multiple large-scale egocentric video datasets.
- Score: 16.240254363118016
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Anticipating future actions based on spatiotemporal observations is essential
in video understanding and predictive computer vision. Moreover, a model
capable of anticipating the future has important applications, it can benefit
precautionary systems to react before an event occurs. However, unlike in the
action recognition task, future information is inaccessible at observation time
-- a model cannot directly map the video frames to the target action to solve
the anticipation task. Instead, the temporal inference is required to associate
the relevant evidence with possible future actions. Consequently, existing
solutions based on the action recognition models are only suboptimal. Recently,
researchers proposed extending the observation window to capture longer
pre-action profiles from past moments and leveraging attention to retrieve the
subtle evidence to improve the anticipation predictions. However, existing
attention designs typically use frame inputs as the query which is suboptimal,
as a video frame only weakly connects to the future action. To this end, we
propose an inductive attention model, dubbed IAM, which leverages the current
prediction priors as the query to infer future action and can efficiently
process the long video content. Furthermore, our method considers the
uncertainty of the future via the many-to-many association in the attention
design. As a result, IAM consistently outperforms the state-of-the-art
anticipation models on multiple large-scale egocentric video datasets while
using significantly fewer model parameters.
Related papers
- From Recognition to Prediction: Leveraging Sequence Reasoning for Action Anticipation [30.161471749050833]
We propose a novel end-to-end video modeling architecture that utilizes attention mechanisms, named Anticipation via Recognition and Reasoning (ARR)
ARR decomposes the action anticipation task into action recognition and reasoning tasks, and effectively learns the statistical relationship between actions by next action prediction (NAP)
In addition, to address the challenge of relationship modeling that requires extensive training data, we propose an innovative approach for the unsupervised pre-training of the decoder.
arXiv Detail & Related papers (2024-08-05T18:38:29Z) - Gated Temporal Diffusion for Stochastic Long-Term Dense Anticipation [17.4088244981231]
Long-term action anticipation has become an important task for many applications such as autonomous driving and human-robot interaction.
We propose a novel Gated Temporal Diffusion (GTD) network that models the uncertainty of both the observation and the future predictions.
Our model achieves state-of-the-art results on the Breakfast, Assembly101 and 50Salads datasets in both deterministic settings.
arXiv Detail & Related papers (2024-07-16T17:48:05Z) - Unified Recurrence Modeling for Video Action Anticipation [16.240254363118016]
We propose a unified recurrence modeling for video action anticipation via message passing framework.
Our proposed method outperforms previous works on the large-scale EPIC-Kitchen dataset.
arXiv Detail & Related papers (2022-06-02T12:16:44Z) - The Wisdom of Crowds: Temporal Progressive Attention for Early Action
Prediction [104.628661890361]
Early action prediction deals with inferring the ongoing action from partially-observed videos, typically at the outset of the video.
We propose a bottleneck-based attention model that captures the evolution of the action, through progressive sampling over fine-to-coarse scales.
arXiv Detail & Related papers (2022-04-28T08:21:09Z) - Video Prediction at Multiple Scales with Hierarchical Recurrent Networks [24.536256844130996]
We propose a novel video prediction model able to forecast future possible outcomes of different levels of granularity simultaneously.
By combining spatial and temporal downsampling, MSPred is able to efficiently predict abstract representations over long time horizons.
In our experiments, we demonstrate that our proposed model accurately predicts future video frames as well as other representations on various scenarios.
arXiv Detail & Related papers (2022-03-17T13:08:28Z) - Self-Regulated Learning for Egocentric Video Activity Anticipation [147.9783215348252]
Self-Regulated Learning (SRL) aims to regulate the intermediate representation consecutively to produce representation that emphasizes the novel information in the frame of the current time-stamp.
SRL sharply outperforms existing state-of-the-art in most cases on two egocentric video datasets and two third-person video datasets.
arXiv Detail & Related papers (2021-11-23T03:29:18Z) - Review of Video Predictive Understanding: Early ActionRecognition and
Future Action Prediction [39.966828592322315]
Action prediction is a major sub-area of video predictive understanding.
Various mathematical tools are widely adopted jointly with computer vision techniques for these two tasks.
Structures that rely on deep convolutional neural networks and recurrent neural networks have been extensively proposed for improving the performance of existing vision tasks.
arXiv Detail & Related papers (2021-07-11T22:46:52Z) - Panoptic Segmentation Forecasting [71.75275164959953]
Our goal is to forecast the near future given a set of recent observations.
We think this ability to forecast, i.e., to anticipate, is integral for the success of autonomous agents.
We develop a two-component model: one component learns the dynamics of the background stuff by anticipating odometry, the other one anticipates the dynamics of detected things.
arXiv Detail & Related papers (2021-04-08T17:59:16Z) - Learning to Anticipate Egocentric Actions by Imagination [60.21323541219304]
We study the egocentric action anticipation task, which predicts future action seconds before it is performed for egocentric videos.
Our method significantly outperforms previous methods on both the seen test set and the unseen test set of the EPIC Kitchens Action Anticipation Challenge.
arXiv Detail & Related papers (2021-01-13T08:04:10Z) - Long-Term Anticipation of Activities with Cycle Consistency [90.79357258104417]
We propose a framework for anticipating future activities directly from the features of the observed frames and train it in an end-to-end fashion.
Our framework achieves state-the-art results on two datasets: the Breakfast dataset and 50Salads.
arXiv Detail & Related papers (2020-09-02T15:41:32Z) - What-If Motion Prediction for Autonomous Driving [58.338520347197765]
Viable solutions must account for both the static geometric context, such as road lanes, and dynamic social interactions arising from multiple actors.
We propose a recurrent graph-based attentional approach with interpretable geometric (actor-lane) and social (actor-actor) relationships.
Our model can produce diverse predictions conditioned on hypothetical or "what-if" road lanes and multi-actor interactions.
arXiv Detail & Related papers (2020-08-24T17:49:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.