Learning to Anticipate Egocentric Actions by Imagination
- URL: http://arxiv.org/abs/2101.04924v2
- Date: Tue, 19 Jan 2021 11:02:10 GMT
- Title: Learning to Anticipate Egocentric Actions by Imagination
- Authors: Yu Wu, Linchao Zhu, Xiaohan Wang, Yi Yang, Fei Wu
- Abstract summary: We study the egocentric action anticipation task, which predicts future action seconds before it is performed for egocentric videos.
Our method significantly outperforms previous methods on both the seen test set and the unseen test set of the EPIC Kitchens Action Anticipation Challenge.
- Score: 60.21323541219304
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Anticipating actions before they are executed is crucial for a wide range of
practical applications, including autonomous driving and robotics. In this
paper, we study the egocentric action anticipation task, which predicts future
action seconds before it is performed for egocentric videos. Previous
approaches focus on summarizing the observed content and directly predicting
future action based on past observations. We believe it would benefit the
action anticipation if we could mine some cues to compensate for the missing
information of the unobserved frames. We then propose to decompose the action
anticipation into a series of future feature predictions. We imagine how the
visual feature changes in the near future and then predicts future action
labels based on these imagined representations. Differently, our ImagineRNN is
optimized in a contrastive learning way instead of feature regression. We
utilize a proxy task to train the ImagineRNN, i.e., selecting the correct
future states from distractors. We further improve ImagineRNN by residual
anticipation, i.e., changing its target to predicting the feature difference of
adjacent frames instead of the frame content. This promotes the network to
focus on our target, i.e., the future action, as the difference between
adjacent frame features is more important for forecasting the future. Extensive
experiments on two large-scale egocentric action datasets validate the
effectiveness of our method. Our method significantly outperforms previous
methods on both the seen test set and the unseen test set of the EPIC Kitchens
Action Anticipation Challenge.
Related papers
- DiffAnt: Diffusion Models for Action Anticipation [12.022815981853071]
Anticipating future actions is inherently uncertain. Given an observed video segment containing ongoing actions, multiple subsequent actions can plausibly follow.
In this work, we rethink action anticipation from a generative view, employing diffusion models to capture different possible future actions.
Our code and trained models will be published on GitHub.
arXiv Detail & Related papers (2023-11-27T16:40:09Z) - Inductive Attention for Video Action Anticipation [16.240254363118016]
We propose an inductive attention model, dubbed IAM, which leverages the current prior predictions as the query to infer future action.
Our method consistently outperforms the state-of-the-art anticipation models on multiple large-scale egocentric video datasets.
arXiv Detail & Related papers (2022-12-17T09:51:17Z) - Unified Recurrence Modeling for Video Action Anticipation [16.240254363118016]
We propose a unified recurrence modeling for video action anticipation via message passing framework.
Our proposed method outperforms previous works on the large-scale EPIC-Kitchen dataset.
arXiv Detail & Related papers (2022-06-02T12:16:44Z) - The Wisdom of Crowds: Temporal Progressive Attention for Early Action
Prediction [104.628661890361]
Early action prediction deals with inferring the ongoing action from partially-observed videos, typically at the outset of the video.
We propose a bottleneck-based attention model that captures the evolution of the action, through progressive sampling over fine-to-coarse scales.
arXiv Detail & Related papers (2022-04-28T08:21:09Z) - Learning Future Object Prediction with a Spatiotemporal Detection
Transformer [1.1543275835002982]
We train a detection transformer to directly output future objects.
We extend existing transformers in two ways to capture scene dynamics.
Our final approach learns to capture the dynamics and make predictions on par with an oracle for 100 ms prediction horizons.
arXiv Detail & Related papers (2022-04-21T17:58:36Z) - A-ACT: Action Anticipation through Cycle Transformations [89.83027919085289]
We take a step back to analyze how the human capability to anticipate the future can be transferred to machine learning algorithms.
A recent study on human psychology explains that, in anticipating an occurrence, the human brain counts on both systems.
In this work, we study the impact of each system for the task of action anticipation and introduce a paradigm to integrate them in a learning framework.
arXiv Detail & Related papers (2022-04-02T21:50:45Z) - Self-Regulated Learning for Egocentric Video Activity Anticipation [147.9783215348252]
Self-Regulated Learning (SRL) aims to regulate the intermediate representation consecutively to produce representation that emphasizes the novel information in the frame of the current time-stamp.
SRL sharply outperforms existing state-of-the-art in most cases on two egocentric video datasets and two third-person video datasets.
arXiv Detail & Related papers (2021-11-23T03:29:18Z) - Panoptic Segmentation Forecasting [71.75275164959953]
Our goal is to forecast the near future given a set of recent observations.
We think this ability to forecast, i.e., to anticipate, is integral for the success of autonomous agents.
We develop a two-component model: one component learns the dynamics of the background stuff by anticipating odometry, the other one anticipates the dynamics of detected things.
arXiv Detail & Related papers (2021-04-08T17:59:16Z) - Spatiotemporal Relationship Reasoning for Pedestrian Intent Prediction [57.56466850377598]
Reasoning over visual data is a desirable capability for robotics and vision-based applications.
In this paper, we present a framework on graph to uncover relationships in different objects in the scene for reasoning about pedestrian intent.
Pedestrian intent, defined as the future action of crossing or not-crossing the street, is a very crucial piece of information for autonomous vehicles.
arXiv Detail & Related papers (2020-02-20T18:50:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.