The Wisdom of Crowds: Temporal Progressive Attention for Early Action
Prediction
- URL: http://arxiv.org/abs/2204.13340v2
- Date: Sat, 1 Apr 2023 07:37:37 GMT
- Title: The Wisdom of Crowds: Temporal Progressive Attention for Early Action
Prediction
- Authors: Alexandros Stergiou, Dima Damen
- Abstract summary: Early action prediction deals with inferring the ongoing action from partially-observed videos, typically at the outset of the video.
We propose a bottleneck-based attention model that captures the evolution of the action, through progressive sampling over fine-to-coarse scales.
- Score: 104.628661890361
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Early action prediction deals with inferring the ongoing action from
partially-observed videos, typically at the outset of the video. We propose a
bottleneck-based attention model that captures the evolution of the action,
through progressive sampling over fine-to-coarse scales. Our proposed Temporal
Progressive (TemPr) model is composed of multiple attention towers, one for
each scale. The predicted action label is based on the collective agreement
considering confidences of these towers. Extensive experiments over four video
datasets showcase state-of-the-art performance on the task of Early Action
Prediction across a range of encoder architectures. We demonstrate the
effectiveness and consistency of TemPr through detailed ablations.
Related papers
- Early Action Recognition with Action Prototypes [62.826125870298306]
We propose a novel model that learns a prototypical representation of the full action for each class.
We decompose the video into short clips, where a visual encoder extracts features from each clip independently.
Later, a decoder aggregates together in an online fashion features from all the clips for the final class prediction.
arXiv Detail & Related papers (2023-12-11T18:31:13Z) - DiffAnt: Diffusion Models for Action Anticipation [12.022815981853071]
Anticipating future actions is inherently uncertain. Given an observed video segment containing ongoing actions, multiple subsequent actions can plausibly follow.
In this work, we rethink action anticipation from a generative view, employing diffusion models to capture different possible future actions.
Our code and trained models will be published on GitHub.
arXiv Detail & Related papers (2023-11-27T16:40:09Z) - Rethinking Learning Approaches for Long-Term Action Anticipation [32.67768331823358]
Action anticipation involves predicting future actions having observed the initial portion of a video.
We introduce ANTICIPATR which performs long-term action anticipation.
We propose a two-stage learning approach to train a novel transformer-based model.
arXiv Detail & Related papers (2022-10-20T20:07:30Z) - Anticipative Video Transformer [105.20878510342551]
Anticipative Video Transformer (AVT) is an end-to-end attention-based video modeling architecture.
We train the model jointly to predict the next action in a video sequence, while also learning frame feature encoders that are predictive of successive future frames' features.
arXiv Detail & Related papers (2021-06-03T17:57:55Z) - Panoptic Segmentation Forecasting [71.75275164959953]
Our goal is to forecast the near future given a set of recent observations.
We think this ability to forecast, i.e., to anticipate, is integral for the success of autonomous agents.
We develop a two-component model: one component learns the dynamics of the background stuff by anticipating odometry, the other one anticipates the dynamics of detected things.
arXiv Detail & Related papers (2021-04-08T17:59:16Z) - Learning to Anticipate Egocentric Actions by Imagination [60.21323541219304]
We study the egocentric action anticipation task, which predicts future action seconds before it is performed for egocentric videos.
Our method significantly outperforms previous methods on both the seen test set and the unseen test set of the EPIC Kitchens Action Anticipation Challenge.
arXiv Detail & Related papers (2021-01-13T08:04:10Z) - MS-TCN++: Multi-Stage Temporal Convolutional Network for Action
Segmentation [87.16030562892537]
We propose a multi-stage architecture for the temporal action segmentation task.
The first stage generates an initial prediction that is refined by the next ones.
Our models achieve state-of-the-art results on three datasets.
arXiv Detail & Related papers (2020-06-16T14:50:47Z) - Temporal Aggregate Representations for Long-Range Video Understanding [26.091400303122867]
Future prediction, especially in long-range videos, requires reasoning from current and past observations.
We address questions of temporal extent, scaling, and level of semantic abstraction with a flexible multi-granular temporal aggregation framework.
arXiv Detail & Related papers (2020-06-01T10:17:55Z) - TTPP: Temporal Transformer with Progressive Prediction for Efficient
Action Anticipation [46.28067541184604]
Video action anticipation aims to predict future action categories from observed frames.
Current state-of-the-art approaches mainly resort to recurrent neural networks to encode history information into hidden states.
This paper proposes a simple yet efficient Temporal Transformer with Progressive Prediction framework.
arXiv Detail & Related papers (2020-03-07T07:59:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.