Delta Distillation for Efficient Video Processing
- URL: http://arxiv.org/abs/2203.09594v1
- Date: Thu, 17 Mar 2022 20:13:30 GMT
- Title: Delta Distillation for Efficient Video Processing
- Authors: Amirhossein Habibian, Haitam Ben Yahia, Davide Abati, Efstratios
Gavves, Fatih Porikli
- Abstract summary: We propose a novel knowledge distillation schema coined as Delta Distillation.
We demonstrate that these temporal variations can be effectively distilled due to the temporal redundancies within video frames.
As a by-product, delta distillation improves the temporal consistency of the teacher model.
- Score: 68.81730245303591
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper aims to accelerate video stream processing, such as object
detection and semantic segmentation, by leveraging the temporal redundancies
that exist between video frames. Instead of propagating and warping features
using motion alignment, such as optical flow, we propose a novel knowledge
distillation schema coined as Delta Distillation. In our proposal, the student
learns the variations in the teacher's intermediate features over time. We
demonstrate that these temporal variations can be effectively distilled due to
the temporal redundancies within video frames. During inference, both teacher
and student cooperate for providing predictions: the former by providing
initial representations extracted only on the key-frame, and the latter by
iteratively estimating and applying deltas for the successive frames. Moreover,
we consider various design choices to learn optimal student architectures
including an end-to-end learnable architecture search. By extensive experiments
on a wide range of architectures, including the most efficient ones, we
demonstrate that delta distillation sets a new state of the art in terms of
accuracy vs. efficiency trade-off for semantic segmentation and object
detection in videos. Finally, we show that, as a by-product, delta distillation
improves the temporal consistency of the teacher model.
Related papers
- The Staged Knowledge Distillation in Video Classification: Harmonizing
Student Progress by a Complementary Weakly Supervised Framework [21.494759678807686]
We propose a new weakly supervised learning framework for knowledge distillation in video classification.
Our approach leverages the concept of substage-based learning to distill knowledge based on the combination of student substages and the correlation of corresponding substages.
Our proposed substage-based distillation approach has the potential to inform future research on label-efficient learning for video data.
arXiv Detail & Related papers (2023-07-11T12:10:42Z) - Knowledge Diffusion for Distillation [53.908314960324915]
The representation gap between teacher and student is an emerging topic in knowledge distillation (KD)
We state that the essence of these methods is to discard the noisy information and distill the valuable information in the feature.
We propose a novel KD method dubbed DiffKD, to explicitly denoise and match features using diffusion models.
arXiv Detail & Related papers (2023-05-25T04:49:34Z) - TimeBalance: Temporally-Invariant and Temporally-Distinctive Video
Representations for Semi-Supervised Action Recognition [68.53072549422775]
We propose a student-teacher semi-supervised learning framework, TimeBalance.
We distill the knowledge from a temporally-invariant and a temporally-distinctive teacher.
Our method achieves state-of-the-art performance on three action recognition benchmarks.
arXiv Detail & Related papers (2023-03-28T19:28:54Z) - DisWOT: Student Architecture Search for Distillation WithOut Training [0.0]
We explore a novel training-free framework to search for the best student architectures for a given teacher.
Our work first empirically show that the optimal model under vanilla training cannot be the winner in distillation.
Our experiments on CIFAR, ImageNet and NAS-Bench-201 demonstrate that our technique achieves state-of-the-art results on different search spaces.
arXiv Detail & Related papers (2023-03-28T01:58:45Z) - EmbedDistill: A Geometric Knowledge Distillation for Information
Retrieval [83.79667141681418]
Large neural models (such as Transformers) achieve state-of-the-art performance for information retrieval (IR)
We propose a novel distillation approach that leverages the relative geometry among queries and documents learned by the large teacher model.
We show that our approach successfully distills from both dual-encoder (DE) and cross-encoder (CE) teacher models to 1/10th size asymmetric students that can retain 95-97% of the teacher performance.
arXiv Detail & Related papers (2023-01-27T22:04:37Z) - Normalized Feature Distillation for Semantic Segmentation [6.882655287146012]
We propose a simple yet effective feature distillation method called normalized feature distillation (NFD)
Our method achieves state-of-the-art distillation results for semantic segmentation on Cityscapes, VOC 2012, and ADE20K datasets.
arXiv Detail & Related papers (2022-07-12T01:54:25Z) - Knowledge Distillation with the Reused Teacher Classifier [31.22117343316628]
We show that a simple knowledge distillation technique is enough to significantly narrow down the teacher-student performance gap.
Our technique achieves state-of-the-art results at the modest cost of compression ratio due to the added projector.
arXiv Detail & Related papers (2022-03-26T06:28:46Z) - Efficient Modelling Across Time of Human Actions and Interactions [92.39082696657874]
We argue that current fixed-sized-temporal kernels in 3 convolutional neural networks (CNNDs) can be improved to better deal with temporal variations in the input.
We study how we can better handle between classes of actions, by enhancing their feature differences over different layers of the architecture.
The proposed approaches are evaluated on several benchmark action recognition datasets and show competitive results.
arXiv Detail & Related papers (2021-10-05T15:39:11Z) - Collaborative Distillation in the Parameter and Spectrum Domains for
Video Action Recognition [79.60708268515293]
This paper explores how to train small and efficient networks for action recognition.
We propose two distillation strategies in the frequency domain, namely the feature spectrum and parameter distribution distillations respectively.
Our method can achieve higher performance than state-of-the-art methods with the same backbone.
arXiv Detail & Related papers (2020-09-15T07:29:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.