Collaborative Distillation in the Parameter and Spectrum Domains for
Video Action Recognition
- URL: http://arxiv.org/abs/2009.06902v1
- Date: Tue, 15 Sep 2020 07:29:57 GMT
- Title: Collaborative Distillation in the Parameter and Spectrum Domains for
Video Action Recognition
- Authors: Haisheng Su, Jing Su, Dongliang Wang, Weihao Gan, Wei Wu, Mengmeng
Wang, Junjie Yan, Yu Qiao
- Abstract summary: This paper explores how to train small and efficient networks for action recognition.
We propose two distillation strategies in the frequency domain, namely the feature spectrum and parameter distribution distillations respectively.
Our method can achieve higher performance than state-of-the-art methods with the same backbone.
- Score: 79.60708268515293
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent years have witnessed the significant progress of action recognition
task with deep networks. However, most of current video networks require large
memory and computational resources, which hinders their applications in
practice. Existing knowledge distillation methods are limited to the
image-level spatial domain, ignoring the temporal and frequency information
which provide structural knowledge and are important for video analysis. This
paper explores how to train small and efficient networks for action
recognition. Specifically, we propose two distillation strategies in the
frequency domain, namely the feature spectrum and parameter distribution
distillations respectively. Our insight is that appealing performance of action
recognition requires \textit{explicitly} modeling the temporal frequency
spectrum of video features. Therefore, we introduce a spectrum loss that
enforces the student network to mimic the temporal frequency spectrum from the
teacher network, instead of \textit{implicitly} distilling features as many
previous works. Second, the parameter frequency distribution is further adopted
to guide the student network to learn the appearance modeling process from the
teacher. Besides, a collaborative learning strategy is presented to optimize
the training process from a probabilistic view. Extensive experiments are
conducted on several action recognition benchmarks, such as Kinetics,
Something-Something, and Jester, which consistently verify effectiveness of our
approach, and demonstrate that our method can achieve higher performance than
state-of-the-art methods with the same backbone.
Related papers
- From Actions to Events: A Transfer Learning Approach Using Improved Deep
Belief Networks [1.0554048699217669]
This paper proposes a novel approach to map the knowledge from action recognition to event recognition using an energy-based model.
Such a model can process all frames simultaneously, carrying spatial and temporal information through the learning process.
arXiv Detail & Related papers (2022-11-30T14:47:10Z) - Class-Incremental Learning for Action Recognition in Videos [44.923719189467164]
We tackle catastrophic forgetting problem in the context of class-incremental learning for video recognition.
Our framework addresses this challenging task by introducing time-channel importance maps and exploiting the importance maps for learning the representations of incoming examples.
We evaluate the proposed approach on brand-new splits of class-incremental action recognition benchmarks constructed upon the UCF101, HMDB51, and Something-Something V2 datasets.
arXiv Detail & Related papers (2022-03-25T12:15:49Z) - Delta Distillation for Efficient Video Processing [68.81730245303591]
We propose a novel knowledge distillation schema coined as Delta Distillation.
We demonstrate that these temporal variations can be effectively distilled due to the temporal redundancies within video frames.
As a by-product, delta distillation improves the temporal consistency of the teacher model.
arXiv Detail & Related papers (2022-03-17T20:13:30Z) - Efficient Modelling Across Time of Human Actions and Interactions [92.39082696657874]
We argue that current fixed-sized-temporal kernels in 3 convolutional neural networks (CNNDs) can be improved to better deal with temporal variations in the input.
We study how we can better handle between classes of actions, by enhancing their feature differences over different layers of the architecture.
The proposed approaches are evaluated on several benchmark action recognition datasets and show competitive results.
arXiv Detail & Related papers (2021-10-05T15:39:11Z) - Improved Speech Emotion Recognition using Transfer Learning and
Spectrogram Augmentation [56.264157127549446]
Speech emotion recognition (SER) is a challenging task that plays a crucial role in natural human-computer interaction.
One of the main challenges in SER is data scarcity.
We propose a transfer learning strategy combined with spectrogram augmentation.
arXiv Detail & Related papers (2021-08-05T10:39:39Z) - On the Post-hoc Explainability of Deep Echo State Networks for Time
Series Forecasting, Image and Video Classification [63.716247731036745]
echo state networks have attracted many stares through time, mainly due to the simplicity and computational efficiency of their learning algorithm.
This work addresses this issue by conducting an explainability study of Echo State Networks when applied to learning tasks with time series, image and video data.
Specifically, the study proposes three different techniques capable of eliciting understandable information about the knowledge grasped by these recurrent models.
arXiv Detail & Related papers (2021-02-17T08:56:33Z) - Fast Video Salient Object Detection via Spatiotemporal Knowledge
Distillation [20.196945571479002]
We present a lightweight network tailored for video salient object detection.
Specifically, we combine a saliency guidance embedding structure and spatial knowledge distillation to refine the spatial features.
In the temporal aspect, we propose a temporal knowledge distillation strategy, which allows the network to learn the robust temporal features.
arXiv Detail & Related papers (2020-10-20T04:48:36Z) - Knowledge Distillation Meets Self-Supervision [109.6400639148393]
Knowledge distillation involves extracting "dark knowledge" from a teacher network to guide the learning of a student network.
We show that the seemingly different self-supervision task can serve as a simple yet powerful solution.
By exploiting the similarity between those self-supervision signals as an auxiliary task, one can effectively transfer the hidden information from the teacher to the student.
arXiv Detail & Related papers (2020-06-12T12:18:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.