DiffAnt: Diffusion Models for Action Anticipation
- URL: http://arxiv.org/abs/2311.15991v1
- Date: Mon, 27 Nov 2023 16:40:09 GMT
- Title: DiffAnt: Diffusion Models for Action Anticipation
- Authors: Zeyun Zhong, Chengzhi Wu, Manuel Martin, Michael Voit, Juergen Gall,
J\"urgen Beyerer
- Abstract summary: Anticipating future actions is inherently uncertain. Given an observed video segment containing ongoing actions, multiple subsequent actions can plausibly follow.
In this work, we rethink action anticipation from a generative view, employing diffusion models to capture different possible future actions.
Our code and trained models will be published on GitHub.
- Score: 12.022815981853071
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Anticipating future actions is inherently uncertain. Given an observed video
segment containing ongoing actions, multiple subsequent actions can plausibly
follow. This uncertainty becomes even larger when predicting far into the
future. However, the majority of existing action anticipation models adhere to
a deterministic approach, neglecting to account for future uncertainties. In
this work, we rethink action anticipation from a generative view, employing
diffusion models to capture different possible future actions. In this
framework, future actions are iteratively generated from standard Gaussian
noise in the latent space, conditioned on the observed video, and subsequently
transitioned into the action space. Extensive experiments on four benchmark
datasets, i.e., Breakfast, 50Salads, EpicKitchens, and EGTEA Gaze+, are
performed and the proposed method achieves superior or comparable results to
state-of-the-art methods, showing the effectiveness of a generative approach
for action anticipation. Our code and trained models will be published on
GitHub.
Related papers
- From Recognition to Prediction: Leveraging Sequence Reasoning for Action Anticipation [30.161471749050833]
We propose a novel end-to-end video modeling architecture that utilizes attention mechanisms, named Anticipation via Recognition and Reasoning (ARR)
ARR decomposes the action anticipation task into action recognition and reasoning tasks, and effectively learns the statistical relationship between actions by next action prediction (NAP)
In addition, to address the challenge of relationship modeling that requires extensive training data, we propose an innovative approach for the unsupervised pre-training of the decoder.
arXiv Detail & Related papers (2024-08-05T18:38:29Z) - Gated Temporal Diffusion for Stochastic Long-Term Dense Anticipation [17.4088244981231]
Long-term action anticipation has become an important task for many applications such as autonomous driving and human-robot interaction.
We propose a novel Gated Temporal Diffusion (GTD) network that models the uncertainty of both the observation and the future predictions.
Our model achieves state-of-the-art results on the Breakfast, Assembly101 and 50Salads datasets in both deterministic settings.
arXiv Detail & Related papers (2024-07-16T17:48:05Z) - Inductive Attention for Video Action Anticipation [16.240254363118016]
We propose an inductive attention model, dubbed IAM, which leverages the current prior predictions as the query to infer future action.
Our method consistently outperforms the state-of-the-art anticipation models on multiple large-scale egocentric video datasets.
arXiv Detail & Related papers (2022-12-17T09:51:17Z) - Finding Islands of Predictability in Action Forecasting [7.215559809521136]
We show that future action sequences are more accurately modeled with variable, rather than one, levels of abstraction.
We propose a combination Bayesian neural network and hierarchical convolutional segmentation model to both accurately predict future actions and optimally select abstraction levels.
arXiv Detail & Related papers (2022-10-13T21:01:16Z) - Weakly-supervised Action Transition Learning for Stochastic Human Motion
Prediction [81.94175022575966]
We introduce the task of action-driven human motion prediction.
It aims to predict multiple plausible future motions given a sequence of action labels and a short motion history.
arXiv Detail & Related papers (2022-05-31T08:38:07Z) - Future Transformer for Long-term Action Anticipation [33.771374384674836]
We propose an end-to-end attention model for action anticipation, dubbed Future Transformer (FUTR)
Unlike the previous autoregressive models, the proposed method learns to predict the whole sequence of future actions in parallel decoding.
We evaluate our method on two standard benchmarks for long-term action anticipation, Breakfast and 50 Salads, achieving state-of-the-art results.
arXiv Detail & Related papers (2022-05-27T14:47:43Z) - The Wisdom of Crowds: Temporal Progressive Attention for Early Action
Prediction [104.628661890361]
Early action prediction deals with inferring the ongoing action from partially-observed videos, typically at the outset of the video.
We propose a bottleneck-based attention model that captures the evolution of the action, through progressive sampling over fine-to-coarse scales.
arXiv Detail & Related papers (2022-04-28T08:21:09Z) - You Mostly Walk Alone: Analyzing Feature Attribution in Trajectory
Prediction [52.442129609979794]
Recent deep learning approaches for trajectory prediction show promising performance.
It remains unclear which features such black-box models actually learn to use for making predictions.
This paper proposes a procedure that quantifies the contributions of different cues to model performance.
arXiv Detail & Related papers (2021-10-11T14:24:15Z) - LookOut: Diverse Multi-Future Prediction and Planning for Self-Driving [139.33800431159446]
LookOut is an approach to jointly perceive the environment and predict a diverse set of futures from sensor data.
We show that our model demonstrates significantly more diverse and sample-efficient motion forecasting in a large-scale self-driving dataset.
arXiv Detail & Related papers (2021-01-16T23:19:22Z) - Learning to Anticipate Egocentric Actions by Imagination [60.21323541219304]
We study the egocentric action anticipation task, which predicts future action seconds before it is performed for egocentric videos.
Our method significantly outperforms previous methods on both the seen test set and the unseen test set of the EPIC Kitchens Action Anticipation Challenge.
arXiv Detail & Related papers (2021-01-13T08:04:10Z) - Long-Term Anticipation of Activities with Cycle Consistency [90.79357258104417]
We propose a framework for anticipating future activities directly from the features of the observed frames and train it in an end-to-end fashion.
Our framework achieves state-the-art results on two datasets: the Breakfast dataset and 50Salads.
arXiv Detail & Related papers (2020-09-02T15:41:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.