Sideways: Depth-Parallel Training of Video Models
- URL: http://arxiv.org/abs/2001.06232v3
- Date: Mon, 30 Mar 2020 22:48:10 GMT
- Title: Sideways: Depth-Parallel Training of Video Models
- Authors: Mateusz Malinowski and Grzegorz Swirszcz and Joao Carreira and Viorica
Patraucean
- Abstract summary: Sideways is an approximate backpropagation scheme for training video models.
We show that Sideways can potentially exhibit better generalization compared to standard synchronized backpropagation.
- Score: 19.370765021278004
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose Sideways, an approximate backpropagation scheme for training video
models. In standard backpropagation, the gradients and activations at every
computation step through the model are temporally synchronized. The forward
activations need to be stored until the backward pass is executed, preventing
inter-layer (depth) parallelization. However, can we leverage smooth, redundant
input streams such as videos to develop a more efficient training scheme? Here,
we explore an alternative to backpropagation; we overwrite network activations
whenever new ones, i.e., from new frames, become available. Such a more gradual
accumulation of information from both passes breaks the precise correspondence
between gradients and activations, leading to theoretically more noisy weight
updates. Counter-intuitively, we show that Sideways training of deep
convolutional video networks not only still converges, but can also potentially
exhibit better generalization compared to standard synchronized
backpropagation.
Related papers
- From Slow Bidirectional to Fast Autoregressive Video Diffusion Models [52.32078428442281]
Current video diffusion models achieve impressive generation quality but struggle in interactive applications due to bidirectional attention dependencies.
We address this limitation by adapting a pretrained bidirectional diffusion transformer to an autoregressive transformer that generates frames on-the-fly.
Our model achieves a total score of 84.27 on the VBench-Long benchmark, surpassing all previous video generation models.
arXiv Detail & Related papers (2024-12-10T18:59:50Z) - Time-, Memory- and Parameter-Efficient Visual Adaptation [75.28557015773217]
We propose an adaptation method which does not backpropagate gradients through the backbone.
We achieve this by designing a lightweight network in parallel that operates on features from the frozen, pretrained backbone.
arXiv Detail & Related papers (2024-02-05T10:55:47Z) - Training-Free Semantic Video Composition via Pre-trained Diffusion Model [96.0168609879295]
Current approaches, predominantly trained on videos with adjusted foreground color and lighting, struggle to address deep semantic disparities beyond superficial adjustments.
We propose a training-free pipeline employing a pre-trained diffusion model imbued with semantic prior knowledge.
Experimental results reveal that our pipeline successfully ensures the visual harmony and inter-frame coherence of the outputs.
arXiv Detail & Related papers (2024-01-17T13:07:22Z) - Refining Pre-Trained Motion Models [56.18044168821188]
We take on the challenge of improving state-of-the-art supervised models with self-supervised training.
We focus on obtaining a "clean" training signal from real-world unlabelled video.
We show that our method yields reliable gains over fully-supervised methods in real videos.
arXiv Detail & Related papers (2024-01-01T18:59:33Z) - TrailBlazer: Trajectory Control for Diffusion-Based Video Generation [11.655256653219604]
Controllability in text-to-video (T2V) generation is often a challenge.
We introduce the concept of keyframing, allowing the subject trajectory and overall appearance to be guided by both a moving bounding box and corresponding prompts.
Despite the simplicity of the bounding box guidance, the resulting motion is surprisingly natural, with emergent effects including perspective and movement toward the virtual camera as the box size increases.
arXiv Detail & Related papers (2023-12-31T10:51:52Z) - Re^2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal
Action Localization [65.33914980022303]
Temporal action localization (TAL) requires long-form reasoning to predict actions of various durations and complex content.
Most methods can only train on pre-extracted features without optimizing them for the localization problem.
We propose a novel end-to-end method Re2TAL, which rewires pretrained video backbones for reversible TAL.
arXiv Detail & Related papers (2022-11-25T12:17:30Z) - Neural Maximum A Posteriori Estimation on Unpaired Data for Motion
Deblurring [87.97330195531029]
We propose a Neural Maximum A Posteriori (NeurMAP) estimation framework for training neural networks to recover blind motion information and sharp content from unpaired data.
The proposed NeurMAP is an approach to existing deblurring neural networks, and is the first framework that enables training image deblurring networks on unpaired datasets.
arXiv Detail & Related papers (2022-04-26T08:09:47Z) - Gradient Forward-Propagation for Large-Scale Temporal Video Modelling [13.665160620951777]
Backpropagation blocks computations until the forward and backward passes are completed.
For temporal signals, this introduces high latency and hinders real-time learning.
In this paper, we build upon Sideways, which avoids blocking by propagating approximate gradients forward in time.
We show how to decouple computation and delegate individual neural modules to different devices, allowing distributed and parallel training.
arXiv Detail & Related papers (2021-06-15T17:50:22Z) - Hardware Beyond Backpropagation: a Photonic Co-Processor for Direct
Feedback Alignment [26.65651157173834]
We present a photonic accelerator for Direct Feedback Alignment, able to compute random projections with trillions of parameters.
This is a significant step towards building scalable hardware, able to go beyond backpropagation.
arXiv Detail & Related papers (2020-12-11T14:20:45Z) - Curriculum Learning for Recurrent Video Object Segmentation [2.3376061255029064]
This work explores different schedule sampling and frame skipping variations to significantly improve the performance of a recurrent architecture.
Our results on the car class of the KITTI-MOTS challenge indicate that, surprisingly, an inverse schedule sampling is a better option than a classic forward one.
arXiv Detail & Related papers (2020-08-15T10:51:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.