Sample Less, Learn More: Efficient Action Recognition via Frame Feature
Restoration
- URL: http://arxiv.org/abs/2307.14866v1
- Date: Thu, 27 Jul 2023 13:52:42 GMT
- Title: Sample Less, Learn More: Efficient Action Recognition via Frame Feature
Restoration
- Authors: Harry Cheng and Yangyang Guo and Liqiang Nie and Zhiyong Cheng and
Mohan Kankanhalli
- Abstract summary: We propose a novel method to restore the intermediate features for two sparsely sampled and adjacent video frames.
With the integration of our method, the efficiency of three commonly used baselines has been improved by over 50%, with a mere 0.5% reduction in recognition accuracy.
- Score: 59.6021678234829
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Training an effective video action recognition model poses significant
computational challenges, particularly under limited resource budgets. Current
methods primarily aim to either reduce model size or utilize pre-trained
models, limiting their adaptability to various backbone architectures. This
paper investigates the issue of over-sampled frames, a prevalent problem in
many approaches yet it has received relatively little attention. Despite the
use of fewer frames being a potential solution, this approach often results in
a substantial decline in performance. To address this issue, we propose a novel
method to restore the intermediate features for two sparsely sampled and
adjacent video frames. This feature restoration technique brings a negligible
increase in computational requirements compared to resource-intensive image
encoders, such as ViT. To evaluate the effectiveness of our method, we conduct
extensive experiments on four public datasets, including Kinetics-400,
ActivityNet, UCF-101, and HMDB-51. With the integration of our method, the
efficiency of three commonly used baselines has been improved by over 50%, with
a mere 0.5% reduction in recognition accuracy. In addition, our method also
surprisingly helps improve the generalization ability of the models under
zero-shot settings.
Related papers
- LiteVAR: Compressing Visual Autoregressive Modelling with Efficient Attention and Quantization [17.190984773586745]
Current AR-based visual generation models require substantial computational resources, limiting their applicability on resource-constrained devices.
We propose efficient attention mechanism and low-bit quantization method to enhance the efficiency of VAR models while maintaining performance.
arXiv Detail & Related papers (2024-11-26T07:32:36Z) - Enhancing Few-Shot Image Classification through Learnable Multi-Scale Embedding and Attention Mechanisms [1.1557852082644071]
In the context of few-shot classification, the goal is to train a classifier using a limited number of samples.
Traditional metric-based methods exhibit certain limitations in achieving this objective.
Our approach involves utilizing multi-output embedding network that maps samples into distinct feature spaces.
arXiv Detail & Related papers (2024-09-12T12:34:29Z) - When Parameter-efficient Tuning Meets General-purpose Vision-language
Models [65.19127815275307]
PETAL revolutionizes the training process by requiring only 0.5% of the total parameters, achieved through a unique mode approximation technique.
Our experiments reveal that PETAL not only outperforms current state-of-the-art methods in most scenarios but also surpasses full fine-tuning models in effectiveness.
arXiv Detail & Related papers (2023-12-16T17:13:08Z) - Accelerating Multiframe Blind Deconvolution via Deep Learning [0.0]
Ground-based solar image restoration is a computationally expensive procedure.
We propose a new method to accelerate the restoration based on algorithm unrolling.
We show that both methods significantly reduce the restoration time compared to the standard optimization procedure.
arXiv Detail & Related papers (2023-06-21T07:53:00Z) - Learning from Multi-Perception Features for Real-Word Image
Super-resolution [87.71135803794519]
We propose a novel SR method called MPF-Net that leverages multiple perceptual features of input images.
Our method incorporates a Multi-Perception Feature Extraction (MPFE) module to extract diverse perceptual information.
We also introduce a contrastive regularization term (CR) that improves the model's learning capability.
arXiv Detail & Related papers (2023-05-26T07:35:49Z) - Convolutional Ensembling based Few-Shot Defect Detection Technique [0.0]
We present a new approach to few-shot classification, where we employ the knowledge-base of multiple pre-trained convolutional models.
Our framework uses a novel ensembling technique for boosting the accuracy while drastically decreasing the total parameter count.
arXiv Detail & Related papers (2022-08-05T17:29:14Z) - FOSTER: Feature Boosting and Compression for Class-Incremental Learning [52.603520403933985]
Deep neural networks suffer from catastrophic forgetting when learning new categories.
We propose a novel two-stage learning paradigm FOSTER, empowering the model to learn new categories adaptively.
arXiv Detail & Related papers (2022-04-10T11:38:33Z) - Powerpropagation: A sparsity inducing weight reparameterisation [65.85142037667065]
We introduce Powerpropagation, a new weight- parameterisation for neural networks that leads to inherently sparse models.
Models trained in this manner exhibit similar performance, but have a distribution with markedly higher density at zero, allowing more parameters to be pruned safely.
Here, we combine Powerpropagation with a traditional weight-pruning technique as well as recent state-of-the-art sparse-to-sparse algorithms, showing superior performance on the ImageNet benchmark.
arXiv Detail & Related papers (2021-10-01T10:03:57Z) - Monocular Real-Time Volumetric Performance Capture [28.481131687883256]
We present the first approach to volumetric performance capture and novel-view rendering at real-time speed from monocular video.
Our system reconstructs a fully textured 3D human from each frame by leveraging Pixel-Aligned Implicit Function (PIFu)
We also introduce an Online Hard Example Mining (OHEM) technique that effectively suppresses failure modes due to the rare occurrence of challenging examples.
arXiv Detail & Related papers (2020-07-28T04:45:13Z) - Towards Practical Lipreading with Distilled and Efficient Models [57.41253104365274]
Lipreading has witnessed a lot of progress due to the resurgence of neural networks.
Recent works have placed emphasis on aspects such as improving performance by finding the optimal architecture or improving generalization.
There is still a significant gap between the current methodologies and the requirements for an effective deployment of lipreading in practical scenarios.
We propose a series of innovations that significantly bridge that gap: first, we raise the state-of-the-art performance by a wide margin on LRW and LRW-1000 to 88.5% and 46.6%, respectively using self-distillation.
arXiv Detail & Related papers (2020-07-13T16:56:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.