Efficient Crowd Counting via Structured Knowledge Transfer
- URL: http://arxiv.org/abs/2003.10120v3
- Date: Tue, 11 Aug 2020 15:31:57 GMT
- Title: Efficient Crowd Counting via Structured Knowledge Transfer
- Authors: Lingbo Liu, Jiaqi Chen, Hefeng Wu, Tianshui Chen, Guanbin Li, Liang
Lin
- Abstract summary: Crowd counting is an application-oriented task and its inference efficiency is crucial for real-world applications.
We propose a novel Structured Knowledge Transfer framework to generate a lightweight but still highly effective student network.
Our models obtain at least 6.5$times$ speed-up on an Nvidia 1080 GPU and even achieve state-of-the-art performance.
- Score: 122.30417437707759
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Crowd counting is an application-oriented task and its inference efficiency
is crucial for real-world applications. However, most previous works relied on
heavy backbone networks and required prohibitive run-time consumption, which
would seriously restrict their deployment scopes and cause poor scalability. To
liberate these crowd counting models, we propose a novel Structured Knowledge
Transfer (SKT) framework, which fully exploits the structured knowledge of a
well-trained teacher network to generate a lightweight but still highly
effective student network. Specifically, it is integrated with two
complementary transfer modules, including an Intra-Layer Pattern Transfer which
sequentially distills the knowledge embedded in layer-wise features of the
teacher network to guide feature learning of the student network and an
Inter-Layer Relation Transfer which densely distills the cross-layer
correlation knowledge of the teacher to regularize the student's feature
evolutio Consequently, our student network can derive the layer-wise and
cross-layer knowledge from the teacher network to learn compact yet effective
features. Extensive evaluations on three benchmarks well demonstrate the
effectiveness of our SKT for extensive crowd counting models. In particular,
only using around $6\%$ of the parameters and computation cost of original
models, our distilled VGG-based models obtain at least 6.5$\times$ speed-up on
an Nvidia 1080 GPU and even achieve state-of-the-art performance. Our code and
models are available at {\url{https://github.com/HCPLab-SYSU/SKT}}.
Related papers
- Adaptive Teaching with Shared Classifier for Knowledge Distillation [6.03477652126575]
Knowledge distillation (KD) is a technique used to transfer knowledge from a teacher network to a student network.
We propose adaptive teaching with a shared classifier (ATSC)
Our approach achieves state-of-the-art results on the CIFAR-100 and ImageNet datasets in both single-teacher and multiteacher scenarios.
arXiv Detail & Related papers (2024-06-12T08:51:08Z) - Continual Learning: Forget-free Winning Subnetworks for Video Representations [75.40220771931132]
Winning Subnetwork (WSN) in terms of task performance is considered for various continual learning tasks.
It leverages pre-existing weights from dense networks to achieve efficient learning in Task Incremental Learning (TIL) and Task-agnostic Incremental Learning (TaIL) scenarios.
The use of Fourier Subneural Operator (FSO) within WSN is considered for Video Incremental Learning (VIL)
arXiv Detail & Related papers (2023-12-19T09:11:49Z) - Crowd Counting with Online Knowledge Learning [23.602652841154164]
We propose an online knowledge learning method for crowd counting.
Our method builds an end-to-end training framework that integrates two independent networks into a single architecture.
Our method achieves comparable performance to state-of-the-art methods despite using far fewer parameters.
arXiv Detail & Related papers (2023-03-18T03:27:57Z) - Learning Knowledge Representation with Meta Knowledge Distillation for
Single Image Super-Resolution [82.89021683451432]
We propose a model-agnostic meta knowledge distillation method under the teacher-student architecture for the single image super-resolution task.
Experiments conducted on various single image super-resolution datasets demonstrate that our proposed method outperforms existing defined knowledge representation related distillation methods.
arXiv Detail & Related papers (2022-07-18T02:41:04Z) - Alignahead: Online Cross-Layer Knowledge Extraction on Graph Neural
Networks [6.8080936803807734]
Existing knowledge distillation methods on graph neural networks (GNNs) are almost offline.
We propose a novel online knowledge distillation framework to resolve this problem.
We develop a cross-layer distillation strategy by aligning ahead one student layer with the layer in different depth of another student model.
arXiv Detail & Related papers (2022-05-05T06:48:13Z) - Distilling EEG Representations via Capsules for Affective Computing [14.67085109524245]
We propose a novel knowledge distillation pipeline to distill EEG representations via capsule-based architectures.
Our framework consistently enables student networks with different compression ratios to effectively learn from the teacher.
Our method achieves state-of-the-art results on one of the two datasets.
arXiv Detail & Related papers (2021-04-30T22:04:35Z) - Distilling Knowledge via Knowledge Review [69.15050871776552]
We study the factor of connection path cross levels between teacher and student networks, and reveal its great importance.
For the first time in knowledge distillation, cross-stage connection paths are proposed.
Our finally designed nested and compact framework requires negligible overhead, and outperforms other methods on a variety of tasks.
arXiv Detail & Related papers (2021-04-19T04:36:24Z) - Knowledge Distillation By Sparse Representation Matching [107.87219371697063]
We propose Sparse Representation Matching (SRM) to transfer intermediate knowledge from one Convolutional Network (CNN) to another by utilizing sparse representation.
We formulate as a neural processing block, which can be efficiently optimized using gradient descent and integrated into any CNN in a plug-and-play manner.
Our experiments demonstrate that is robust to architectural differences between the teacher and student networks, and outperforms other KD techniques across several datasets.
arXiv Detail & Related papers (2021-03-31T11:47:47Z) - Network-Agnostic Knowledge Transfer for Medical Image Segmentation [2.25146058725705]
We propose a knowledge transfer approach from a teacher to a student network wherein we train the student on an independent transferal dataset.
We studied knowledge transfer from a single teacher, combination of knowledge transfer and fine-tuning, and knowledge transfer from multiple teachers.
The proposed algorithm is effective for knowledge transfer and easily tunable.
arXiv Detail & Related papers (2021-01-23T19:06:14Z) - Graph-Based Neural Network Models with Multiple Self-Supervised
Auxiliary Tasks [79.28094304325116]
Graph Convolutional Networks are among the most promising approaches for capturing relationships among structured data points.
We propose three novel self-supervised auxiliary tasks to train graph-based neural network models in a multi-task fashion.
arXiv Detail & Related papers (2020-11-14T11:09:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.