TTPP: Temporal Transformer with Progressive Prediction for Efficient
Action Anticipation
- URL: http://arxiv.org/abs/2003.03530v1
- Date: Sat, 7 Mar 2020 07:59:42 GMT
- Title: TTPP: Temporal Transformer with Progressive Prediction for Efficient
Action Anticipation
- Authors: Wen Wang, Xiaojiang Peng, Yanzhou Su, Yu Qiao, Jian Cheng
- Abstract summary: Video action anticipation aims to predict future action categories from observed frames.
Current state-of-the-art approaches mainly resort to recurrent neural networks to encode history information into hidden states.
This paper proposes a simple yet efficient Temporal Transformer with Progressive Prediction framework.
- Score: 46.28067541184604
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video action anticipation aims to predict future action categories from
observed frames. Current state-of-the-art approaches mainly resort to recurrent
neural networks to encode history information into hidden states, and predict
future actions from the hidden representations. It is well known that the
recurrent pipeline is inefficient in capturing long-term information which may
limit its performance in predication task. To address this problem, this paper
proposes a simple yet efficient Temporal Transformer with Progressive
Prediction (TTPP) framework, which repurposes a Transformer-style architecture
to aggregate observed features, and then leverages a light-weight network to
progressively predict future features and actions. Specifically, predicted
features along with predicted probabilities are accumulated into the inputs of
subsequent prediction. We evaluate our approach on three action datasets,
namely TVSeries, THUMOS-14, and TV-Human-Interaction. Additionally we also
conduct a comprehensive study for several popular aggregation and prediction
strategies. Extensive results show that TTPP not only outperforms the
state-of-the-art methods but also more efficient.
Related papers
- DeTPP: Leveraging Object Detection for Robust Long-Horizon Event Prediction [1.534667887016089]
We introduce DeTPP, a novel approach inspired by object detection techniques from computer vision.
DeTPP employs a unique matching-based loss function that selectively prioritizes reliably predictable events.
The proposed hybrid approach enhances the accuracy of next event prediction by up to 2.7% on a large transactional dataset.
arXiv Detail & Related papers (2024-08-23T14:57:46Z) - From Recognition to Prediction: Leveraging Sequence Reasoning for Action Anticipation [30.161471749050833]
We propose a novel end-to-end video modeling architecture that utilizes attention mechanisms, named Anticipation via Recognition and Reasoning (ARR)
ARR decomposes the action anticipation task into action recognition and reasoning tasks, and effectively learns the statistical relationship between actions by next action prediction (NAP)
In addition, to address the challenge of relationship modeling that requires extensive training data, we propose an innovative approach for the unsupervised pre-training of the decoder.
arXiv Detail & Related papers (2024-08-05T18:38:29Z) - HPNet: Dynamic Trajectory Forecasting with Historical Prediction Attention [76.37139809114274]
HPNet is a novel dynamic trajectory forecasting method.
We propose a Historical Prediction Attention module to automatically encode the dynamic relationship between successive predictions.
Our code is available at https://github.com/XiaolongTang23/HPNet.
arXiv Detail & Related papers (2024-04-09T14:42:31Z) - Performative Time-Series Forecasting [71.18553214204978]
We formalize performative time-series forecasting (PeTS) from a machine-learning perspective.
We propose a novel approach, Feature Performative-Shifting (FPS), which leverages the concept of delayed response to anticipate distribution shifts.
We conduct comprehensive experiments using multiple time-series models on COVID-19 and traffic forecasting tasks.
arXiv Detail & Related papers (2023-10-09T18:34:29Z) - Temporal DINO: A Self-supervised Video Strategy to Enhance Action
Prediction [15.696593695918844]
This paper introduces a novel self-supervised video strategy for enhancing action prediction inspired by DINO (self-distillation with no labels)
The experimental results showcase significant improvements in prediction performance across 3D-ResNet, Transformer, and LSTM architectures.
These findings highlight the potential of our approach in diverse video-based tasks such as activity recognition, motion planning, and scene understanding.
arXiv Detail & Related papers (2023-08-08T21:18:23Z) - Implicit Occupancy Flow Fields for Perception and Prediction in
Self-Driving [68.95178518732965]
A self-driving vehicle (SDV) must be able to perceive its surroundings and predict the future behavior of other traffic participants.
Existing works either perform object detection followed by trajectory of the detected objects, or predict dense occupancy and flow grids for the whole scene.
This motivates our unified approach to perception and future prediction that implicitly represents occupancy and flow over time with a single neural network.
arXiv Detail & Related papers (2023-08-02T23:39:24Z) - Event-based Vision for Early Prediction of Manipulation Actions [0.7699714865575189]
Neuromorphic visual sensors are artificial retinas that sequences output of events when brightness changes occur in the scene.
In this study, we introduce an event-based dataset on fine-grained manipulation actions.
We also perform an experimental study on the use of transformers for action prediction with events.
arXiv Detail & Related papers (2023-07-26T17:50:17Z) - Streaming egocentric action anticipation: An evaluation scheme and
approach [27.391434284586985]
Egocentric action anticipation aims to predict the future actions the camera wearer will perform from the observation of the past.
Current evaluation schemes assume that predictions are available right after the input video is observed.
We propose a streaming egocentric action evaluation scheme which assumes that predictions are performed online and made available only after the model has processed the current input segment.
arXiv Detail & Related papers (2023-06-29T04:53:29Z) - The Wisdom of Crowds: Temporal Progressive Attention for Early Action
Prediction [104.628661890361]
Early action prediction deals with inferring the ongoing action from partially-observed videos, typically at the outset of the video.
We propose a bottleneck-based attention model that captures the evolution of the action, through progressive sampling over fine-to-coarse scales.
arXiv Detail & Related papers (2022-04-28T08:21:09Z) - Adversarial Refinement Network for Human Motion Prediction [61.50462663314644]
Two popular methods, recurrent neural networks and feed-forward deep networks, are able to predict rough motion trend.
We propose an Adversarial Refinement Network (ARNet) following a simple yet effective coarse-to-fine mechanism with novel adversarial error augmentation.
arXiv Detail & Related papers (2020-11-23T05:42:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.