Deep Sequence Learning for Video Anticipation: From Discrete and
Deterministic to Continuous and Stochastic
- URL: http://arxiv.org/abs/2010.04368v1
- Date: Fri, 9 Oct 2020 04:40:58 GMT
- Title: Deep Sequence Learning for Video Anticipation: From Discrete and
Deterministic to Continuous and Stochastic
- Authors: Sadegh Aliakbarian
- Abstract summary: Video anticipation is the task of predicting one/multiple future representation(s) given limited, partial observation.
In particular, in this thesis, we make several contributions to the literature of video anticipation.
- Score: 1.52292571922932
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video anticipation is the task of predicting one/multiple future
representation(s) given limited, partial observation. This is a challenging
task due to the fact that given limited observation, the future representation
can be highly ambiguous. Based on the nature of the task, video anticipation
can be considered from two viewpoints: the level of details and the level of
determinism in the predicted future. In this research, we start from
anticipating a coarse representation of a deterministic future and then move
towards predicting continuous and fine-grained future representations of a
stochastic process. The example of the former is video action anticipation in
which we are interested in predicting one action label given a partially
observed video and the example of the latter is forecasting multiple diverse
continuations of human motion given partially observed one. In particular, in
this thesis, we make several contributions to the literature of video
anticipation...
Related papers
- Gated Temporal Diffusion for Stochastic Long-Term Dense Anticipation [17.4088244981231]
Long-term action anticipation has become an important task for many applications such as autonomous driving and human-robot interaction.
We propose a novel Gated Temporal Diffusion (GTD) network that models the uncertainty of both the observation and the future predictions.
Our model achieves state-of-the-art results on the Breakfast, Assembly101 and 50Salads datasets in both deterministic settings.
arXiv Detail & Related papers (2024-07-16T17:48:05Z) - DiffAnt: Diffusion Models for Action Anticipation [12.022815981853071]
Anticipating future actions is inherently uncertain. Given an observed video segment containing ongoing actions, multiple subsequent actions can plausibly follow.
In this work, we rethink action anticipation from a generative view, employing diffusion models to capture different possible future actions.
Our code and trained models will be published on GitHub.
arXiv Detail & Related papers (2023-11-27T16:40:09Z) - Rethinking Learning Approaches for Long-Term Action Anticipation [32.67768331823358]
Action anticipation involves predicting future actions having observed the initial portion of a video.
We introduce ANTICIPATR which performs long-term action anticipation.
We propose a two-stage learning approach to train a novel transformer-based model.
arXiv Detail & Related papers (2022-10-20T20:07:30Z) - The Wisdom of Crowds: Temporal Progressive Attention for Early Action
Prediction [104.628661890361]
Early action prediction deals with inferring the ongoing action from partially-observed videos, typically at the outset of the video.
We propose a bottleneck-based attention model that captures the evolution of the action, through progressive sampling over fine-to-coarse scales.
arXiv Detail & Related papers (2022-04-28T08:21:09Z) - A-ACT: Action Anticipation through Cycle Transformations [89.83027919085289]
We take a step back to analyze how the human capability to anticipate the future can be transferred to machine learning algorithms.
A recent study on human psychology explains that, in anticipating an occurrence, the human brain counts on both systems.
In this work, we study the impact of each system for the task of action anticipation and introduce a paradigm to integrate them in a learning framework.
arXiv Detail & Related papers (2022-04-02T21:50:45Z) - Video Prediction at Multiple Scales with Hierarchical Recurrent Networks [24.536256844130996]
We propose a novel video prediction model able to forecast future possible outcomes of different levels of granularity simultaneously.
By combining spatial and temporal downsampling, MSPred is able to efficiently predict abstract representations over long time horizons.
In our experiments, we demonstrate that our proposed model accurately predicts future video frames as well as other representations on various scenarios.
arXiv Detail & Related papers (2022-03-17T13:08:28Z) - Review of Video Predictive Understanding: Early ActionRecognition and
Future Action Prediction [39.966828592322315]
Action prediction is a major sub-area of video predictive understanding.
Various mathematical tools are widely adopted jointly with computer vision techniques for these two tasks.
Structures that rely on deep convolutional neural networks and recurrent neural networks have been extensively proposed for improving the performance of existing vision tasks.
arXiv Detail & Related papers (2021-07-11T22:46:52Z) - Panoptic Segmentation Forecasting [71.75275164959953]
Our goal is to forecast the near future given a set of recent observations.
We think this ability to forecast, i.e., to anticipate, is integral for the success of autonomous agents.
We develop a two-component model: one component learns the dynamics of the background stuff by anticipating odometry, the other one anticipates the dynamics of detected things.
arXiv Detail & Related papers (2021-04-08T17:59:16Z) - Learning to Anticipate Egocentric Actions by Imagination [60.21323541219304]
We study the egocentric action anticipation task, which predicts future action seconds before it is performed for egocentric videos.
Our method significantly outperforms previous methods on both the seen test set and the unseen test set of the EPIC Kitchens Action Anticipation Challenge.
arXiv Detail & Related papers (2021-01-13T08:04:10Z) - Long-Term Anticipation of Activities with Cycle Consistency [90.79357258104417]
We propose a framework for anticipating future activities directly from the features of the observed frames and train it in an end-to-end fashion.
Our framework achieves state-the-art results on two datasets: the Breakfast dataset and 50Salads.
arXiv Detail & Related papers (2020-09-02T15:41:32Z) - Adversarial Generative Grammars for Human Activity Prediction [141.43526239537502]
We propose an adversarial generative grammar model for future prediction.
Our grammar is designed so that it can learn production rules from the data distribution.
Being able to select multiple production rules during inference leads to different predicted outcomes.
arXiv Detail & Related papers (2020-08-11T17:47:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.