End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames
- URL: http://arxiv.org/abs/2311.17241v2
- Date: Sat, 20 Apr 2024 19:30:38 GMT
- Title: End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames
- Authors: Shuming Liu, Chen-Lin Zhang, Chen Zhao, Bernard Ghanem,
- Abstract summary: temporal action detection (TAD) has seen significant performance improvement with end-to-end training.
Due to the memory bottleneck, only models with limited scales and limited data volumes can afford end-to-end training.
We reduce the memory consumption for end-to-end training, and manage to scale up the TAD backbone to 1 billion parameters and the input video to 1,536 frames.
- Score: 55.72994484532856
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, temporal action detection (TAD) has seen significant performance improvement with end-to-end training. However, due to the memory bottleneck, only models with limited scales and limited data volumes can afford end-to-end training, which inevitably restricts TAD performance. In this paper, we reduce the memory consumption for end-to-end training, and manage to scale up the TAD backbone to 1 billion parameters and the input video to 1,536 frames, leading to significant detection performance. The key to our approach lies in our proposed temporal-informative adapter (TIA), which is a novel lightweight module that reduces training memory. Using TIA, we free the humongous backbone from learning to adapt to the TAD task by only updating the parameters in TIA. TIA also leads to better TAD representation by temporally aggregating context from adjacent frames throughout the backbone. We evaluate our model across four representative datasets. Owing to our efficient design, we are able to train end-to-end on VideoMAEv2-giant and achieve 75.4% mAP on THUMOS14, being the first end-to-end model to outperform the best feature-based methods. Code is available at https://github.com/sming256/AdaTAD.
Related papers
- Dynamic Adapter Meets Prompt Tuning: Parameter-Efficient Transfer Learning for Point Cloud Analysis [51.14136878142034]
Point cloud analysis has achieved outstanding performance by transferring point cloud pre-trained models.
Existing methods for model adaptation usually update all model parameters, which is inefficient as it relies on high computational costs.
In this paper, we aim to study parameter-efficient transfer learning for point cloud analysis with an ideal trade-off between task performance and parameter efficiency.
arXiv Detail & Related papers (2024-03-03T08:25:04Z) - Low-rank Attention Side-Tuning for Parameter-Efficient Fine-Tuning [19.17362588650503]
Low-rank Attention Side-Tuning (LAST) trains a side-network composed of only low-rank self-attention modules.
We show LAST can be highly parallel across multiple optimization objectives, making it very efficient in downstream task adaptation.
arXiv Detail & Related papers (2024-02-06T14:03:15Z) - Time-, Memory- and Parameter-Efficient Visual Adaptation [75.28557015773217]
We propose an adaptation method which does not backpropagate gradients through the backbone.
We achieve this by designing a lightweight network in parallel that operates on features from the frozen, pretrained backbone.
arXiv Detail & Related papers (2024-02-05T10:55:47Z) - Learning Semantic Proxies from Visual Prompts for Parameter-Efficient Fine-Tuning in Deep Metric Learning [13.964106147449051]
Existing solutions concentrate on fine-tuning the pre-trained models on conventional image datasets.
We propose a novel and effective framework based on learning Visual Prompts (VPT) in the pre-trained Vision Transformers (ViT)
We demonstrate that our new approximations with semantic information are superior to representative capabilities.
arXiv Detail & Related papers (2024-02-04T04:42:05Z) - PVP: Pre-trained Visual Parameter-Efficient Tuning [29.05396521860764]
Large-scale pre-trained transformers have demonstrated remarkable success in various computer vision tasks.
It is still highly challenging to fully fine-tune these models for downstream tasks due to their high computational and storage costs.
We propose a Pre-trained Visual.
efficient (PVP) Tuning framework, which pre-trains the parameter-efficient tuning modules first and then leverages the pre-trained modules.
arXiv Detail & Related papers (2023-04-26T15:55:29Z) - Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than
In-Context Learning [81.3514358542452]
Few-shot in-context learning (ICL) incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made.
parameter-efficient fine-tuning offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task.
In this paper, we rigorously compare few-shot ICL and parameter-efficient fine-tuning and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs.
arXiv Detail & Related papers (2022-05-11T17:10:41Z) - DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language
Models [152.29364079385635]
As pre-trained models grow bigger, the fine-tuning process can be time-consuming and computationally expensive.
We propose a framework for resource- and parameter-efficient fine-tuning by leveraging the sparsity prior in both weight updates and the final model weights.
Our proposed framework, dubbed Dually Sparsity-Embedded Efficient Tuning (DSEE), aims to achieve two key objectives: (i) parameter efficient fine-tuning and (ii) resource-efficient inference.
arXiv Detail & Related papers (2021-10-30T03:29:47Z) - Joint Parameter-and-Bandwidth Allocation for Improving the Efficiency of
Partitioned Edge Learning [73.82875010696849]
Machine learning algorithms are deployed at the network edge for training artificial intelligence (AI) models.
This paper focuses on the novel joint design of parameter (computation load) allocation and bandwidth allocation.
arXiv Detail & Related papers (2020-03-10T05:52:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.