Temporal Action Detection Model Compression by Progressive Block Drop
- URL: http://arxiv.org/abs/2503.16916v1
- Date: Fri, 21 Mar 2025 07:26:55 GMT
- Title: Temporal Action Detection Model Compression by Progressive Block Drop
- Authors: Xiaoyong Chen, Yong Guo, Jiaming Liang, Sitong Zhuang, Runhao Zeng, Xiping Hu,
- Abstract summary: Temporal action detection (TAD) aims to identify and localize action instances in untrimmed videos.<n>Recent improvements in model performance, driven by larger feature extractors and datasets, have led to increased computational demands.<n>We propose a Progressive Block Drop method that reduces model depth while retaining layer width.<n>Our method achieves a 25% reduction in computational overhead on two TAD benchmarks.
- Score: 19.122679098124816
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Temporal action detection (TAD) aims to identify and localize action instances in untrimmed videos, which is essential for various video understanding tasks. However, recent improvements in model performance, driven by larger feature extractors and datasets, have led to increased computational demands. This presents a challenge for applications like autonomous driving and robotics, which rely on limited computational resources. While existing channel pruning methods can compress these models, reducing the number of channels often hinders the parallelization efficiency of GPU, due to the inefficient multiplication between small matrices. Instead of pruning channels, we propose a Progressive Block Drop method that reduces model depth while retaining layer width. In this way, we still use large matrices for computation but reduce the number of multiplications. Our approach iteratively removes redundant blocks in two steps: first, we drop blocks with minimal impact on model performance; and second, we employ a parameter-efficient cross-depth alignment technique, fine-tuning the pruned model to restore model accuracy. Our method achieves a 25% reduction in computational overhead on two TAD benchmarks (THUMOS14 and ActivityNet-1.3) to achieve lossless compression. More critically, we empirically show that our method is orthogonal to channel pruning methods and can be combined with it to yield further efficiency gains.
Related papers
- V"Mean"ba: Visual State Space Models only need 1 hidden dimension [0.7864304771129751]
State Space Models (SSMs) have emerged as a solution by introducing a linear recurrence mechanism.<n>We introduce textitVMeanba, a training-free compression method that eliminates the channel dimension in SSMs using mean operations.<n>We show that textitVMeanba achieves up to a 1.12x speedup with less than a 3% accuracy loss.
arXiv Detail & Related papers (2024-12-21T12:27:07Z) - LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers [79.07412045476872]
Diffusion Transformers have emerged as the preeminent models for a wide array of generative tasks.<n>We show that performing the full of the model at each diffusion step is unnecessary, as some computations can be skipped by lazily reusing the results of previous steps.<n>We propose a lazy learning framework that efficiently leverages cached results from earlier steps to skip redundant computations.
arXiv Detail & Related papers (2024-12-17T01:12:35Z) - Numerical Pruning for Efficient Autoregressive Models [87.56342118369123]
This paper focuses on compressing decoder-only transformer-based autoregressive models through structural weight pruning.
Specifically, we propose a training-free pruning method that calculates a numerical score with Newton's method for the Attention and modules, respectively.
To verify the effectiveness of our method, we provide both theoretical support and extensive experiments.
arXiv Detail & Related papers (2024-12-17T01:09:23Z) - SparseTem: Boosting the Efficiency of CNN-Based Video Encoders by Exploiting Temporal Continuity [15.872209884833977]
We propose a memory-efficient scheduling method to eliminate memory overhead and an online adjustment mechanism to minimize accuracy degradation.
SparseTem achieves speedup of 1.79x for EfficientDet and 4.72x for CRNN, with minimal accuracy drop and no additional memory overhead.
arXiv Detail & Related papers (2024-10-28T07:13:25Z) - Router-Tuning: A Simple and Effective Approach for Enabling Dynamic-Depth in Transformers [40.40923861822689]
Mixture of Depths (MoD) dynamically adjust the computational depth by skipping less important layers.<n>MoD approaches face two main challenges: (1) high training costs due to the need to train the entire model along with the routers that determine which layers to skip, and (2) the risk of performance degradation when important layers are bypassed.<n>We propose Router-Tuning, a method that fine-tunes only the router on a small dataset, drastically reducing the computational overhead associated with full model training.<n>For the second challenge, we propose MindSkip, which deploys Attention with Dynamic Depths.
arXiv Detail & Related papers (2024-10-17T03:23:50Z) - Two Sparse Matrices are Better than One: Sparsifying Neural Networks with Double Sparse Factorization [0.0]
We present Double Sparse Factorization (DSF), where we factorize each weight matrix into two sparse matrices.
Our method achieves state-of-the-art results, enabling unprecedented sparsification of neural networks.
arXiv Detail & Related papers (2024-09-27T15:48:39Z) - A-SDM: Accelerating Stable Diffusion through Redundancy Removal and
Performance Optimization [54.113083217869516]
In this work, we first explore the computational redundancy part of the network.
We then prune the redundancy blocks of the model and maintain the network performance.
Thirdly, we propose a global-regional interactive (GRI) attention to speed up the computationally intensive attention part.
arXiv Detail & Related papers (2023-12-24T15:37:47Z) - Pushing the Limits of Asynchronous Graph-based Object Detection with
Event Cameras [62.70541164894224]
We introduce several architecture choices which allow us to scale the depth and complexity of such models while maintaining low computation.
Our method runs 3.7 times faster than a dense graph neural network, taking only 8.4 ms per forward pass.
arXiv Detail & Related papers (2022-11-22T15:14:20Z) - ETAD: A Unified Framework for Efficient Temporal Action Detection [70.21104995731085]
Untrimmed video understanding such as temporal action detection (TAD) often suffers from the pain of huge demand for computing resources.
We build a unified framework for efficient end-to-end temporal action detection (ETAD)
ETAD achieves state-of-the-art performance on both THUMOS-14 and ActivityNet-1.3.
arXiv Detail & Related papers (2022-05-14T21:16:21Z) - ABCP: Automatic Block-wise and Channel-wise Network Pruning via Joint
Search [10.544159086698112]
We propose Automatic Block-wise and Channel-wise Network Pruning (ABCP) to jointly search the block-wise and channel-wise pruning action with deep reinforcement learning.
Compared with the traditional rule-based pruning method, this pipeline saves human labor and achieves a higher compression ratio with lower accuracy loss.
Tested on the mobile robot detection dataset, the pruned YOLOv3 model saves 99.5% FLOPs, reduces 99.5% parameters, and achieves 37.3 times speed up with only 2.8% mAP loss.
arXiv Detail & Related papers (2021-10-08T02:15:49Z) - Approximated Bilinear Modules for Temporal Modeling [116.6506871576514]
Two-layers in CNNs can be converted to temporal bilinear modules by adding an auxiliary-branch sampling.
Our models can outperform most state-of-the-art methods on SomethingSomething v1 and v2 datasets without pretraining.
arXiv Detail & Related papers (2020-07-25T09:07:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.