Re^2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal
Action Localization
- URL: http://arxiv.org/abs/2211.14053v2
- Date: Tue, 28 Mar 2023 08:48:52 GMT
- Title: Re^2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal
Action Localization
- Authors: Chen Zhao, Shuming Liu, Karttikeya Mangalam, Bernard Ghanem
- Abstract summary: Temporal action localization (TAL) requires long-form reasoning to predict actions of various durations and complex content.
Most methods can only train on pre-extracted features without optimizing them for the localization problem.
We propose a novel end-to-end method Re2TAL, which rewires pretrained video backbones for reversible TAL.
- Score: 65.33914980022303
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Temporal action localization (TAL) requires long-form reasoning to predict
actions of various durations and complex content. Given limited GPU memory,
training TAL end to end (i.e., from videos to predictions) on long videos is a
significant challenge. Most methods can only train on pre-extracted features
without optimizing them for the localization problem, consequently limiting
localization performance. In this work, to extend the potential in TAL
networks, we propose a novel end-to-end method Re2TAL, which rewires pretrained
video backbones for reversible TAL. Re2TAL builds a backbone with reversible
modules, where the input can be recovered from the output such that the bulky
intermediate activations can be cleared from memory during training. Instead of
designing one single type of reversible module, we propose a network rewiring
mechanism, to transform any module with a residual connection to a reversible
module without changing any parameters. This provides two benefits: (1) a large
variety of reversible networks are easily obtained from existing and even
future model designs, and (2) the reversible models require much less training
effort as they reuse the pre-trained parameters of their original
non-reversible versions. Re2TAL, only using the RGB modality, reaches 37.01%
average mAP on ActivityNet-v1.3, a new state-of-the-art record, and mAP 64.9%
at tIoU=0.5 on THUMOS-14, outperforming all other RGB-only methods.
Related papers
- Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA [38.30350849992281]
"Recursive" language models share parameters across layers with minimal loss of performance.
Recursive Transformers are efficiently from standard pretrained Transformers, but only use a single block of unique layers that is then repeated multiple times in a loop.
We show that our models outperform both similar-sized vanilla pretrained models and knowledge distillation baselines.
arXiv Detail & Related papers (2024-10-28T02:15:45Z) - SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios.
In the early route, intermediate outputs are consolidated via an anti-redundancy operation.
In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z) - Look-Around Before You Leap: High-Frequency Injected Transformer for Image Restoration [46.96362010335177]
In this paper, we propose HIT, a simple yet effective High-frequency Injected Transformer for image restoration.
Specifically, we design a window-wise injection module (WIM), which incorporates abundant high-frequency details into the feature map, to provide reliable references for restoring high-quality images.
In addition, we introduce a spatial enhancement unit (SEU) to preserve essential spatial relationships that may be lost due to the computations carried out across channel dimensions in the BIM.
arXiv Detail & Related papers (2024-03-30T08:05:00Z) - Dr$^2$Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning [81.0108753452546]
We propose Dynamic Reversible Dual-Residual Networks, or Dr$2$Net, to finetune a pretrained model with substantially reduced memory consumption.
Dr$2$Net contains two types of residual connections, one maintaining the residual structure in the pretrained models, and the other making the network reversible.
We show that Dr$2$Net can reach comparable performance to conventional finetuning but with significantly less memory usage.
arXiv Detail & Related papers (2024-01-08T18:59:31Z) - Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model [89.8764435351222]
We propose a new family of unbiased estimators called WTA-CRS, for matrix production with reduced variance.
Our work provides both theoretical and experimental evidence that, in the context of tuning transformers, our proposed estimators exhibit lower variance compared to existing ones.
arXiv Detail & Related papers (2023-05-24T15:52:08Z) - ResiDual: Transformer with Dual Residual Connections [106.38073506751003]
Two widely used variants are the Post-Layer-Normalization (Post-LN) and Pre-Layer-Normalization (Pre-LN)
Post-LN causes gradient vanishing issue that hinders training deep Transformers, and Pre-LN causes representation collapse issue that limits model capacity.
We propose ResiDual, a novel Transformer architecture with Pre-Post-LN (PPLN), which fuses connections in Post-LN Pre-LN together.
arXiv Detail & Related papers (2023-04-28T12:19:47Z) - Deep Neural Networks are Surprisingly Reversible: A Baseline for
Zero-Shot Inversion [90.65667807498086]
This paper presents a zero-shot direct model inversion framework that recovers the input to the trained model given only the internal representation.
We empirically show that modern classification models on ImageNet can, surprisingly, be inverted, allowing an approximate recovery of the original 224x224px images from a representation after more than 20 layers.
arXiv Detail & Related papers (2021-07-13T18:01:43Z) - Invertible Residual Network with Regularization for Effective Medical
Image Segmentation [2.76240219662896]
Invertible neural networks have been applied to significantly reduce activation memory footprint when training neural networks with backpropagation.
We propose two versions of the invertible Residual Network, namely Partially Invertible Residual Network (Partially-InvRes) and Fully Invertible Residual Network (Fully-InvRes)
Our results indicate that by using partially/fully invertible networks as the central workhorse in volumetric segmentation, we not only reduce memory overhead but also achieve compatible segmentation performance compared against the non-invertible 3D Unet.
arXiv Detail & Related papers (2021-03-16T13:19:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.