TKN: Transformer-based Keypoint Prediction Network For Real-time Video
Prediction
- URL: http://arxiv.org/abs/2303.09807v2
- Date: Mon, 20 Mar 2023 10:57:45 GMT
- Title: TKN: Transformer-based Keypoint Prediction Network For Real-time Video
Prediction
- Authors: Haoran Li, Pengyuan Zhou, Yihang Lin, Yanbin Hao, Haiyong Xie, Yong
Liao
- Abstract summary: We propose a transformer-based keypoint prediction neural network (TKN) for video prediction.
TKN is an unsupervised learning method that boost the prediction process via constrained information extraction and parallel prediction scheme.
Extensive experiments on KTH and Human3.6 datasets demonstrate that TKN predicts 11 times faster than existing methods.
- Score: 16.294105130947
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Video prediction is a complex time-series forecasting task with great
potential in many use cases. However, conventional methods overemphasize
accuracy while ignoring the slow prediction speed caused by complicated model
structures that learn too much redundant information with excessive GPU memory
consumption. Furthermore, conventional methods mostly predict frames
sequentially (frame-by-frame) and thus are hard to accelerate. Consequently,
valuable use cases such as real-time danger prediction and warning cannot
achieve fast enough inference speed to be applicable in reality. Therefore, we
propose a transformer-based keypoint prediction neural network (TKN), an
unsupervised learning method that boost the prediction process via constrained
information extraction and parallel prediction scheme. TKN is the first
real-time video prediction solution to our best knowledge, while significantly
reducing computation costs and maintaining other performance. Extensive
experiments on KTH and Human3.6 datasets demonstrate that TKN predicts 11 times
faster than existing methods while reducing memory consumption by 17.4% and
achieving state-of-the-art prediction performance on average.
Related papers
- Real-time Video Prediction With Fast Video Interpolation Model and Prediction Training [9.225628670664596]
We propose real-time video prediction towards the zero-latency interaction over networks, called IFRVP.
We introduce ELAN-based residual blocks into the prediction models to improve both inference speed and accuracy.
Our evaluations show that our proposed models perform efficiently and achieve the best trade-off between prediction accuracy and computational speed.
arXiv Detail & Related papers (2025-03-29T18:48:46Z) - Coarse-to-fine Deep Video Coding with Hyperprior-guided Mode Prediction [50.361427832256524]
We propose a coarse-to-fine (C2F) deep video compression framework for better motion compensation.
Our C2F framework can achieve better motion compensation results without significantly increasing bit costs.
arXiv Detail & Related papers (2022-06-15T11:38:53Z) - A Novel Prediction Setup for Online Speed-Scaling [3.3440413258080577]
It is fundamental to incorporate energy considerations when designing (scheduling) algorithms.
This paper attempts to obtain the best of both worlds for the classical, deadline based, online speed-scaling problem.
arXiv Detail & Related papers (2021-12-06T14:46:20Z) - Learning to Predict Trustworthiness with Steep Slope Loss [69.40817968905495]
We study the problem of predicting trustworthiness on real-world large-scale datasets.
We observe that the trustworthiness predictors trained with prior-art loss functions are prone to view both correct predictions and incorrect predictions to be trustworthy.
We propose a novel steep slope loss to separate the features w.r.t. correct predictions from the ones w.r.t. incorrect predictions by two slide-like curves that oppose each other.
arXiv Detail & Related papers (2021-09-30T19:19:09Z) - Uncertainty-Aware Time-to-Event Prediction using Deep Kernel Accelerated
Failure Time Models [11.171712535005357]
We propose Deep Kernel Accelerated Failure Time models for the time-to-event prediction task.
Our model shows better point estimate performance than recurrent neural network based baselines in experiments on two real-world datasets.
arXiv Detail & Related papers (2021-07-26T14:55:02Z) - Adversarial Refinement Network for Human Motion Prediction [61.50462663314644]
Two popular methods, recurrent neural networks and feed-forward deep networks, are able to predict rough motion trend.
We propose an Adversarial Refinement Network (ARNet) following a simple yet effective coarse-to-fine mechanism with novel adversarial error augmentation.
arXiv Detail & Related papers (2020-11-23T05:42:20Z) - Long-Short Term Spatiotemporal Tensor Prediction for Passenger Flow
Profile [15.875569404476495]
We focus on a tensor-based prediction and propose several practical techniques to improve prediction.
For long-term prediction specifically, we propose the "Tensor Decomposition + 2-Dimensional Auto-Regressive Moving Average (2D-ARMA)" model.
For short-term prediction, we propose to conduct tensor completion based on tensor clustering to avoid oversimplifying and ensure accuracy.
arXiv Detail & Related papers (2020-04-23T08:30:00Z) - Predictive Business Process Monitoring via Generative Adversarial Nets:
The Case of Next Event Prediction [0.026249027950824504]
This paper proposes a novel adversarial training framework to address the problem of next event prediction.
It works by putting one neural network against the other in a two-player game which leads to predictions that are indistinguishable from the ground truth.
It systematically outperforms all baselines both in terms of accuracy and earliness of the prediction, despite using a simple network architecture and a naive feature encoding.
arXiv Detail & Related papers (2020-03-25T08:31:28Z) - Accelerating Deep Reinforcement Learning With the Aid of Partial Model:
Energy-Efficient Predictive Video Streaming [97.75330397207742]
Predictive power allocation is conceived for energy-efficient video streaming over mobile networks using deep reinforcement learning.
To handle the continuous state and action spaces, we resort to deep deterministic policy gradient (DDPG) algorithm.
Our simulation results show that the proposed policies converge to the optimal policy that is derived based on perfect large-scale channel prediction.
arXiv Detail & Related papers (2020-03-21T17:36:53Z) - Post-Estimation Smoothing: A Simple Baseline for Learning with Side
Information [102.18616819054368]
We propose a post-estimation smoothing operator as a fast and effective method for incorporating structural index data into prediction.
Because the smoothing step is separate from the original predictor, it applies to a broad class of machine learning tasks.
Our experiments on large scale spatial and temporal datasets highlight the speed and accuracy of post-estimation smoothing in practice.
arXiv Detail & Related papers (2020-03-12T18:04:20Z) - TTPP: Temporal Transformer with Progressive Prediction for Efficient
Action Anticipation [46.28067541184604]
Video action anticipation aims to predict future action categories from observed frames.
Current state-of-the-art approaches mainly resort to recurrent neural networks to encode history information into hidden states.
This paper proposes a simple yet efficient Temporal Transformer with Progressive Prediction framework.
arXiv Detail & Related papers (2020-03-07T07:59:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.