Related papers: Gradient Forward-Propagation for Large-Scale Temporal Video Modelling

Gradient Forward-Propagation for Large-Scale Temporal Video Modelling

URL: http://arxiv.org/abs/2106.08318v1
Date: Tue, 15 Jun 2021 17:50:22 GMT
Title: Gradient Forward-Propagation for Large-Scale Temporal Video Modelling
Authors: Mateusz Malinowski and Dimitrios Vytiniotis and Grzegorz Swirszcz and Viorica Patraucean and Joao Carreira
Abstract summary: Backpropagation blocks computations until the forward and backward passes are completed. For temporal signals, this introduces high latency and hinders real-time learning. In this paper, we build upon Sideways, which avoids blocking by propagating approximate gradients forward in time. We show how to decouple computation and delegate individual neural modules to different devices, allowing distributed and parallel training.
Score: 13.665160620951777
License: http://creativecommons.org/licenses/by/4.0/
Abstract: How can neural networks be trained on large-volume temporal data efficiently? To compute the gradients required to update parameters, backpropagation blocks computations until the forward and backward passes are completed. For temporal signals, this introduces high latency and hinders real-time learning. It also creates a coupling between consecutive layers, which limits model parallelism and increases memory consumption. In this paper, we build upon Sideways, which avoids blocking by propagating approximate gradients forward in time, and we propose mechanisms for temporal integration of information based on different variants of skip connections. We also show how to decouple computation and delegate individual neural modules to different devices, allowing distributed and parallel training. The proposed Skip-Sideways achieves low latency training, model parallelism, and, importantly, is capable of extracting temporal features, leading to more stable training and improved performance on real-world action recognition video datasets such as HMDB51, UCF101, and the large-scale Kinetics-600. Finally, we also show that models trained with Skip-Sideways generate better future frames than Sideways models, and hence they can better utilize motion cues.

Related papers

STLight: a Fully Convolutional Approach for Efficient Predictive Learning by Spatio-Temporal joint Processing [6.872340834265972]
We propose STLight, a novel method for S-temporal learning that relies solely on channel-wise and depth-wise convolutions as learnable layers. STLight overcomes the limitations of traditional convolutional approaches by rearranging spatial and temporal dimensions together. Our architecture achieves state-of-the-art performance on STL benchmarks across datasets and settings, while significantly improving computational efficiency in terms of parameters and computational FLOPs.
arXiv Detail & Related papers (2024-11-15T13:53:19Z)
Time-, Memory- and Parameter-Efficient Visual Adaptation [75.28557015773217]
We propose an adaptation method which does not backpropagate gradients through the backbone. We achieve this by designing a lightweight network in parallel that operates on features from the frozen, pretrained backbone.
arXiv Detail & Related papers (2024-02-05T10:55:47Z)
Efficient Asynchronous Federated Learning with Sparsification and Quantization [55.6801207905772]
Federated Learning (FL) is attracting more and more attention to collaboratively train a machine learning model without transferring raw data. FL generally exploits a parameter server and a large number of edge devices during the whole process of the model training. We propose TEASQ-Fed to exploit edge devices to asynchronously participate in the training process by actively applying for tasks.
arXiv Detail & Related papers (2023-12-23T07:47:07Z)
Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning [59.26623999209235]
We present DiST, which disentangles the learning of spatial and temporal aspects of videos. The disentangled learning in DiST is highly efficient because it avoids the back-propagation of massive pre-trained parameters. Extensive experiments on five benchmarks show that DiST delivers better performance than existing state-of-the-art methods by convincing gaps.
arXiv Detail & Related papers (2023-09-14T17:58:33Z)
Large Scale Time-Series Representation Learning via Simultaneous Low and High Frequency Feature Bootstrapping [7.0064929761691745]
We propose a non-contrastive self-supervised learning approach efficiently captures low and high-frequency time-varying features. Our method takes raw time series data as input and creates two different augmented views for two branches of the model. To demonstrate the robustness of our model we performed extensive experiments and ablation studies on five real-world time-series datasets.
arXiv Detail & Related papers (2022-04-24T14:39:47Z)
Adaptive Machine Learning for Time-Varying Systems: Low Dimensional Latent Space Tuning [91.3755431537592]
We present a recently developed method of adaptive machine learning for time-varying systems. Our approach is to map very high (N>100k) dimensional inputs into the low dimensional (N2) latent space at the output of the encoder section of an encoder-decoder CNN. This method allows us to learn correlations within and to track their evolution in real time based on feedback without interrupts.
arXiv Detail & Related papers (2021-07-13T16:05:28Z)
PGT: A Progressive Method for Training Models on Long Videos [45.935259079953255]
Main-stream method is to split a raw video into clips, leading to incomplete temporal information flow. Inspired by natural language processing techniques dealing with long sentences, we propose to treat videos as serial fragments satisfying Markov property. We empirically demonstrate that it yields significant performance improvements on different models and datasets.
arXiv Detail & Related papers (2021-03-21T06:15:20Z)
Approximated Bilinear Modules for Temporal Modeling [116.6506871576514]
Two-layers in CNNs can be converted to temporal bilinear modules by adding an auxiliary-branch sampling. Our models can outperform most state-of-the-art methods on SomethingSomething v1 and v2 datasets without pretraining.
arXiv Detail & Related papers (2020-07-25T09:07:35Z)
Convolutional Tensor-Train LSTM for Spatio-temporal Learning [116.24172387469994]
We propose a higher-order LSTM model that can efficiently learn long-term correlations in the video sequence. This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time. Our results achieve state-of-the-art performance-art in a wide range of applications and datasets.
arXiv Detail & Related papers (2020-02-21T05:00:01Z)
Sideways: Depth-Parallel Training of Video Models [19.370765021278004]
Sideways is an approximate backpropagation scheme for training video models. We show that Sideways can potentially exhibit better generalization compared to standard synchronized backpropagation.
arXiv Detail & Related papers (2020-01-17T10:49:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.