Related papers: Sample Less, Learn More: Efficient Action Recognition via Frame Feature Restoration

Sample Less, Learn More: Efficient Action Recognition via Frame Feature Restoration

URL: http://arxiv.org/abs/2307.14866v1
Date: Thu, 27 Jul 2023 13:52:42 GMT
Title: Sample Less, Learn More: Efficient Action Recognition via Frame Feature Restoration
Authors: Harry Cheng and Yangyang Guo and Liqiang Nie and Zhiyong Cheng and Mohan Kankanhalli
Abstract summary: We propose a novel method to restore the intermediate features for two sparsely sampled and adjacent video frames. With the integration of our method, the efficiency of three commonly used baselines has been improved by over 50%, with a mere 0.5% reduction in recognition accuracy.
Score: 59.6021678234829
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Training an effective video action recognition model poses significant computational challenges, particularly under limited resource budgets. Current methods primarily aim to either reduce model size or utilize pre-trained models, limiting their adaptability to various backbone architectures. This paper investigates the issue of over-sampled frames, a prevalent problem in many approaches yet it has received relatively little attention. Despite the use of fewer frames being a potential solution, this approach often results in a substantial decline in performance. To address this issue, we propose a novel method to restore the intermediate features for two sparsely sampled and adjacent video frames. This feature restoration technique brings a negligible increase in computational requirements compared to resource-intensive image encoders, such as ViT. To evaluate the effectiveness of our method, we conduct extensive experiments on four public datasets, including Kinetics-400, ActivityNet, UCF-101, and HMDB-51. With the integration of our method, the efficiency of three commonly used baselines has been improved by over 50%, with a mere 0.5% reduction in recognition accuracy. In addition, our method also surprisingly helps improve the generalization ability of the models under zero-shot settings.

Related papers

One-Step Diffusion Model for Image Motion-Deblurring [85.76149042561507]
We propose a one-step diffusion model for deblurring (OSDD), a novel framework that reduces the denoising process to a single step. To tackle fidelity loss in diffusion models, we introduce an enhanced variational autoencoder (eVAE), which improves structural restoration. Our method achieves strong performance on both full and no-reference metrics.
arXiv Detail & Related papers (2025-03-09T09:39:57Z)
Multi-Grained Feature Pruning for Video-Based Human Pose Estimation [19.297490509277463]
We propose a novel multi-scale resolution framework for human pose estimation. We employ a density clustering method to identify tokens that offer important semantic information. Our method achieves a 93.8% improvement in inference speed compared to the baseline.
arXiv Detail & Related papers (2025-03-07T12:14:51Z)
Numerical Pruning for Efficient Autoregressive Models [87.56342118369123]
This paper focuses on compressing decoder-only transformer-based autoregressive models through structural weight pruning. Specifically, we propose a training-free pruning method that calculates a numerical score with Newton's method for the Attention and modules, respectively. To verify the effectiveness of our method, we provide both theoretical support and extensive experiments.
arXiv Detail & Related papers (2024-12-17T01:09:23Z)
LiteVAR: Compressing Visual Autoregressive Modelling with Efficient Attention and Quantization [17.190984773586745]
Current AR-based visual generation models require substantial computational resources, limiting their applicability on resource-constrained devices. We propose efficient attention mechanism and low-bit quantization method to enhance the efficiency of VAR models while maintaining performance.
arXiv Detail & Related papers (2024-11-26T07:32:36Z)
Enhancing Few-Shot Image Classification through Learnable Multi-Scale Embedding and Attention Mechanisms [1.1557852082644071]
In the context of few-shot classification, the goal is to train a classifier using a limited number of samples. Traditional metric-based methods exhibit certain limitations in achieving this objective. Our approach involves utilizing multi-output embedding network that maps samples into distinct feature spaces.
arXiv Detail & Related papers (2024-09-12T12:34:29Z)
When Parameter-efficient Tuning Meets General-purpose Vision-language Models [65.19127815275307]
PETAL revolutionizes the training process by requiring only 0.5% of the total parameters, achieved through a unique mode approximation technique. Our experiments reveal that PETAL not only outperforms current state-of-the-art methods in most scenarios but also surpasses full fine-tuning models in effectiveness.
arXiv Detail & Related papers (2023-12-16T17:13:08Z)
Accelerating Multiframe Blind Deconvolution via Deep Learning [0.0]
Ground-based solar image restoration is a computationally expensive procedure. We propose a new method to accelerate the restoration based on algorithm unrolling. We show that both methods significantly reduce the restoration time compared to the standard optimization procedure.
arXiv Detail & Related papers (2023-06-21T07:53:00Z)
Learning from Multi-Perception Features for Real-Word Image Super-resolution [87.71135803794519]
We propose a novel SR method called MPF-Net that leverages multiple perceptual features of input images. Our method incorporates a Multi-Perception Feature Extraction (MPFE) module to extract diverse perceptual information. We also introduce a contrastive regularization term (CR) that improves the model's learning capability.
arXiv Detail & Related papers (2023-05-26T07:35:49Z)
Convolutional Ensembling based Few-Shot Defect Detection Technique [0.0]
We present a new approach to few-shot classification, where we employ the knowledge-base of multiple pre-trained convolutional models. Our framework uses a novel ensembling technique for boosting the accuracy while drastically decreasing the total parameter count.
arXiv Detail & Related papers (2022-08-05T17:29:14Z)
FOSTER: Feature Boosting and Compression for Class-Incremental Learning [52.603520403933985]
Deep neural networks suffer from catastrophic forgetting when learning new categories. We propose a novel two-stage learning paradigm FOSTER, empowering the model to learn new categories adaptively.
arXiv Detail & Related papers (2022-04-10T11:38:33Z)
Powerpropagation: A sparsity inducing weight reparameterisation [65.85142037667065]
We introduce Powerpropagation, a new weight- parameterisation for neural networks that leads to inherently sparse models. Models trained in this manner exhibit similar performance, but have a distribution with markedly higher density at zero, allowing more parameters to be pruned safely. Here, we combine Powerpropagation with a traditional weight-pruning technique as well as recent state-of-the-art sparse-to-sparse algorithms, showing superior performance on the ImageNet benchmark.
arXiv Detail & Related papers (2021-10-01T10:03:57Z)
Monocular Real-Time Volumetric Performance Capture [28.481131687883256]
We present the first approach to volumetric performance capture and novel-view rendering at real-time speed from monocular video. Our system reconstructs a fully textured 3D human from each frame by leveraging Pixel-Aligned Implicit Function (PIFu) We also introduce an Online Hard Example Mining (OHEM) technique that effectively suppresses failure modes due to the rare occurrence of challenging examples.
arXiv Detail & Related papers (2020-07-28T04:45:13Z)
Towards Practical Lipreading with Distilled and Efficient Models [57.41253104365274]
Lipreading has witnessed a lot of progress due to the resurgence of neural networks. Recent works have placed emphasis on aspects such as improving performance by finding the optimal architecture or improving generalization. There is still a significant gap between the current methodologies and the requirements for an effective deployment of lipreading in practical scenarios. We propose a series of innovations that significantly bridge that gap: first, we raise the state-of-the-art performance by a wide margin on LRW and LRW-1000 to 88.5% and 46.6%, respectively using self-distillation.
arXiv Detail & Related papers (2020-07-13T16:56:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.