Future Transformer for Long-term Action Anticipation
- URL: http://arxiv.org/abs/2205.14022v1
- Date: Fri, 27 May 2022 14:47:43 GMT
- Title: Future Transformer for Long-term Action Anticipation
- Authors: Dayoung Gong, Joonseok Lee, Manjin Kim, Seong Jong Ha, Minsu Cho
- Abstract summary: We propose an end-to-end attention model for action anticipation, dubbed Future Transformer (FUTR)
Unlike the previous autoregressive models, the proposed method learns to predict the whole sequence of future actions in parallel decoding.
We evaluate our method on two standard benchmarks for long-term action anticipation, Breakfast and 50 Salads, achieving state-of-the-art results.
- Score: 33.771374384674836
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The task of predicting future actions from a video is crucial for a
real-world agent interacting with others. When anticipating actions in the
distant future, we humans typically consider long-term relations over the whole
sequence of actions, i.e., not only observed actions in the past but also
potential actions in the future. In a similar spirit, we propose an end-to-end
attention model for action anticipation, dubbed Future Transformer (FUTR), that
leverages global attention over all input frames and output tokens to predict a
minutes-long sequence of future actions. Unlike the previous autoregressive
models, the proposed method learns to predict the whole sequence of future
actions in parallel decoding, enabling more accurate and fast inference for
long-term anticipation. We evaluate our method on two standard benchmarks for
long-term action anticipation, Breakfast and 50 Salads, achieving
state-of-the-art results.
Related papers
- Gated Temporal Diffusion for Stochastic Long-Term Dense Anticipation [17.4088244981231]
Long-term action anticipation has become an important task for many applications such as autonomous driving and human-robot interaction.
We propose a novel Gated Temporal Diffusion (GTD) network that models the uncertainty of both the observation and the future predictions.
Our model achieves state-of-the-art results on the Breakfast, Assembly101 and 50Salads datasets in both deterministic settings.
arXiv Detail & Related papers (2024-07-16T17:48:05Z) - DiffAnt: Diffusion Models for Action Anticipation [12.022815981853071]
Anticipating future actions is inherently uncertain. Given an observed video segment containing ongoing actions, multiple subsequent actions can plausibly follow.
In this work, we rethink action anticipation from a generative view, employing diffusion models to capture different possible future actions.
Our code and trained models will be published on GitHub.
arXiv Detail & Related papers (2023-11-27T16:40:09Z) - Rethinking Learning Approaches for Long-Term Action Anticipation [32.67768331823358]
Action anticipation involves predicting future actions having observed the initial portion of a video.
We introduce ANTICIPATR which performs long-term action anticipation.
We propose a two-stage learning approach to train a novel transformer-based model.
arXiv Detail & Related papers (2022-10-20T20:07:30Z) - Weakly-supervised Action Transition Learning for Stochastic Human Motion
Prediction [81.94175022575966]
We introduce the task of action-driven human motion prediction.
It aims to predict multiple plausible future motions given a sequence of action labels and a short motion history.
arXiv Detail & Related papers (2022-05-31T08:38:07Z) - The Wisdom of Crowds: Temporal Progressive Attention for Early Action
Prediction [104.628661890361]
Early action prediction deals with inferring the ongoing action from partially-observed videos, typically at the outset of the video.
We propose a bottleneck-based attention model that captures the evolution of the action, through progressive sampling over fine-to-coarse scales.
arXiv Detail & Related papers (2022-04-28T08:21:09Z) - Learning to Anticipate Egocentric Actions by Imagination [60.21323541219304]
We study the egocentric action anticipation task, which predicts future action seconds before it is performed for egocentric videos.
Our method significantly outperforms previous methods on both the seen test set and the unseen test set of the EPIC Kitchens Action Anticipation Challenge.
arXiv Detail & Related papers (2021-01-13T08:04:10Z) - Long Term Motion Prediction Using Keyposes [122.22758311506588]
We argue that, to achieve long term forecasting, predicting human pose at every time instant is unnecessary.
We call such poses "keyposes", and approximate complex motions by linearly interpolating between subsequent keyposes.
We show that learning the sequence of such keyposes allows us to predict very long term motion, up to 5 seconds in the future.
arXiv Detail & Related papers (2020-12-08T20:45:51Z) - Long-Term Anticipation of Activities with Cycle Consistency [90.79357258104417]
We propose a framework for anticipating future activities directly from the features of the observed frames and train it in an end-to-end fashion.
Our framework achieves state-the-art results on two datasets: the Breakfast dataset and 50Salads.
arXiv Detail & Related papers (2020-09-02T15:41:32Z) - TTPP: Temporal Transformer with Progressive Prediction for Efficient
Action Anticipation [46.28067541184604]
Video action anticipation aims to predict future action categories from observed frames.
Current state-of-the-art approaches mainly resort to recurrent neural networks to encode history information into hidden states.
This paper proposes a simple yet efficient Temporal Transformer with Progressive Prediction framework.
arXiv Detail & Related papers (2020-03-07T07:59:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.