Gradient Forward-Propagation for Large-Scale Temporal Video Modelling
- URL: http://arxiv.org/abs/2106.08318v1
- Date: Tue, 15 Jun 2021 17:50:22 GMT
- Title: Gradient Forward-Propagation for Large-Scale Temporal Video Modelling
- Authors: Mateusz Malinowski and Dimitrios Vytiniotis and Grzegorz Swirszcz and
Viorica Patraucean and Joao Carreira
- Abstract summary: Backpropagation blocks computations until the forward and backward passes are completed.
For temporal signals, this introduces high latency and hinders real-time learning.
In this paper, we build upon Sideways, which avoids blocking by propagating approximate gradients forward in time.
We show how to decouple computation and delegate individual neural modules to different devices, allowing distributed and parallel training.
- Score: 13.665160620951777
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: How can neural networks be trained on large-volume temporal data efficiently?
To compute the gradients required to update parameters, backpropagation blocks
computations until the forward and backward passes are completed. For temporal
signals, this introduces high latency and hinders real-time learning. It also
creates a coupling between consecutive layers, which limits model parallelism
and increases memory consumption. In this paper, we build upon Sideways, which
avoids blocking by propagating approximate gradients forward in time, and we
propose mechanisms for temporal integration of information based on different
variants of skip connections. We also show how to decouple computation and
delegate individual neural modules to different devices, allowing distributed
and parallel training. The proposed Skip-Sideways achieves low latency
training, model parallelism, and, importantly, is capable of extracting
temporal features, leading to more stable training and improved performance on
real-world action recognition video datasets such as HMDB51, UCF101, and the
large-scale Kinetics-600. Finally, we also show that models trained with
Skip-Sideways generate better future frames than Sideways models, and hence
they can better utilize motion cues.
Related papers
- Time-, Memory- and Parameter-Efficient Visual Adaptation [75.28557015773217]
We propose an adaptation method which does not backpropagate gradients through the backbone.
We achieve this by designing a lightweight network in parallel that operates on features from the frozen, pretrained backbone.
arXiv Detail & Related papers (2024-02-05T10:55:47Z) - Efficient Asynchronous Federated Learning with Sparsification and
Quantization [55.6801207905772]
Federated Learning (FL) is attracting more and more attention to collaboratively train a machine learning model without transferring raw data.
FL generally exploits a parameter server and a large number of edge devices during the whole process of the model training.
We propose TEASQ-Fed to exploit edge devices to asynchronously participate in the training process by actively applying for tasks.
arXiv Detail & Related papers (2023-12-23T07:47:07Z) - AccEPT: An Acceleration Scheme for Speeding Up Edge Pipeline-parallel
Training [22.107070114339038]
We propose AccEPT, an acceleration scheme for accelerating the edge collaborative pipeline-parallel training.
In particular, we propose a light-weight adaptive latency predictor to accurately estimate the latency of each layer at different devices.
Our numerical results demonstrate that our proposed acceleration approach is able to significantly speed up edge pipeline parallel training up to 3 times faster.
arXiv Detail & Related papers (2023-11-10T02:18:33Z) - Disentangling Spatial and Temporal Learning for Efficient Image-to-Video
Transfer Learning [59.26623999209235]
We present DiST, which disentangles the learning of spatial and temporal aspects of videos.
The disentangled learning in DiST is highly efficient because it avoids the back-propagation of massive pre-trained parameters.
Extensive experiments on five benchmarks show that DiST delivers better performance than existing state-of-the-art methods by convincing gaps.
arXiv Detail & Related papers (2023-09-14T17:58:33Z) - Large Scale Time-Series Representation Learning via Simultaneous Low and
High Frequency Feature Bootstrapping [7.0064929761691745]
We propose a non-contrastive self-supervised learning approach efficiently captures low and high-frequency time-varying features.
Our method takes raw time series data as input and creates two different augmented views for two branches of the model.
To demonstrate the robustness of our model we performed extensive experiments and ablation studies on five real-world time-series datasets.
arXiv Detail & Related papers (2022-04-24T14:39:47Z) - Adaptive Machine Learning for Time-Varying Systems: Low Dimensional
Latent Space Tuning [91.3755431537592]
We present a recently developed method of adaptive machine learning for time-varying systems.
Our approach is to map very high (N>100k) dimensional inputs into the low dimensional (N2) latent space at the output of the encoder section of an encoder-decoder CNN.
This method allows us to learn correlations within and to track their evolution in real time based on feedback without interrupts.
arXiv Detail & Related papers (2021-07-13T16:05:28Z) - PGT: A Progressive Method for Training Models on Long Videos [45.935259079953255]
Main-stream method is to split a raw video into clips, leading to incomplete temporal information flow.
Inspired by natural language processing techniques dealing with long sentences, we propose to treat videos as serial fragments satisfying Markov property.
We empirically demonstrate that it yields significant performance improvements on different models and datasets.
arXiv Detail & Related papers (2021-03-21T06:15:20Z) - Approximated Bilinear Modules for Temporal Modeling [116.6506871576514]
Two-layers in CNNs can be converted to temporal bilinear modules by adding an auxiliary-branch sampling.
Our models can outperform most state-of-the-art methods on SomethingSomething v1 and v2 datasets without pretraining.
arXiv Detail & Related papers (2020-07-25T09:07:35Z) - Convolutional Tensor-Train LSTM for Spatio-temporal Learning [116.24172387469994]
We propose a higher-order LSTM model that can efficiently learn long-term correlations in the video sequence.
This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time.
Our results achieve state-of-the-art performance-art in a wide range of applications and datasets.
arXiv Detail & Related papers (2020-02-21T05:00:01Z) - Sideways: Depth-Parallel Training of Video Models [19.370765021278004]
Sideways is an approximate backpropagation scheme for training video models.
We show that Sideways can potentially exhibit better generalization compared to standard synchronized backpropagation.
arXiv Detail & Related papers (2020-01-17T10:49:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.