Related papers: Efficient Crowd Counting via Structured Knowledge Transfer

Efficient Crowd Counting via Structured Knowledge Transfer

URL: http://arxiv.org/abs/2003.10120v3
Date: Tue, 11 Aug 2020 15:31:57 GMT
Title: Efficient Crowd Counting via Structured Knowledge Transfer
Authors: Lingbo Liu, Jiaqi Chen, Hefeng Wu, Tianshui Chen, Guanbin Li, Liang Lin
Abstract summary: Crowd counting is an application-oriented task and its inference efficiency is crucial for real-world applications. We propose a novel Structured Knowledge Transfer framework to generate a lightweight but still highly effective student network. Our models obtain at least 6.5$times$ speed-up on an Nvidia 1080 GPU and even achieve state-of-the-art performance.
Score: 122.30417437707759
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Crowd counting is an application-oriented task and its inference efficiency is crucial for real-world applications. However, most previous works relied on heavy backbone networks and required prohibitive run-time consumption, which would seriously restrict their deployment scopes and cause poor scalability. To liberate these crowd counting models, we propose a novel Structured Knowledge Transfer (SKT) framework, which fully exploits the structured knowledge of a well-trained teacher network to generate a lightweight but still highly effective student network. Specifically, it is integrated with two complementary transfer modules, including an Intra-Layer Pattern Transfer which sequentially distills the knowledge embedded in layer-wise features of the teacher network to guide feature learning of the student network and an Inter-Layer Relation Transfer which densely distills the cross-layer correlation knowledge of the teacher to regularize the student's feature evolutio Consequently, our student network can derive the layer-wise and cross-layer knowledge from the teacher network to learn compact yet effective features. Extensive evaluations on three benchmarks well demonstrate the effectiveness of our SKT for extensive crowd counting models. In particular, only using around $6\%$ of the parameters and computation cost of original models, our distilled VGG-based models obtain at least 6.5$\times$ speed-up on an Nvidia 1080 GPU and even achieve state-of-the-art performance. Our code and models are available at {\url{https://github.com/HCPLab-SYSU/SKT}}.

Related papers

Ensemble Learning via Knowledge Transfer for CTR Prediction [9.891226177252653]
In this paper, we investigate larger ensemble networks and find three inherent limitations in commonly used ensemble learning method. We propose a novel model-agnostic Ensemble Knowledge Transfer Framework (EKTF) Experimental results on five real-world datasets demonstrate the effectiveness and compatibility of EKTF.
arXiv Detail & Related papers (2024-11-25T06:14:20Z)
(PASS) Visual Prompt Locates Good Structure Sparsity through a Recurrent HyperNetwork [60.889175951038496]
Large-scale neural networks have demonstrated remarkable performance in different domains like vision and language processing. One of the key questions of structural pruning is how to estimate the channel significance. We propose a novel algorithmic framework, namely textttPASS. It is a tailored hyper-network to take both visual prompts and network weight statistics as input, and output layer-wise channel sparsity in a recurrent manner.
arXiv Detail & Related papers (2024-07-24T16:47:45Z)
Adaptive Teaching with Shared Classifier for Knowledge Distillation [6.03477652126575]
Knowledge distillation (KD) is a technique used to transfer knowledge from a teacher network to a student network. We propose adaptive teaching with a shared classifier (ATSC) Our approach achieves state-of-the-art results on the CIFAR-100 and ImageNet datasets in both single-teacher and multiteacher scenarios.
arXiv Detail & Related papers (2024-06-12T08:51:08Z)
Continual Learning: Forget-free Winning Subnetworks for Video Representations [75.40220771931132]
Winning Subnetwork (WSN) in terms of task performance is considered for various continual learning tasks. It leverages pre-existing weights from dense networks to achieve efficient learning in Task Incremental Learning (TIL) and Task-agnostic Incremental Learning (TaIL) scenarios. The use of Fourier Subneural Operator (FSO) within WSN is considered for Video Incremental Learning (VIL)
arXiv Detail & Related papers (2023-12-19T09:11:49Z)
Crowd Counting with Online Knowledge Learning [23.602652841154164]
We propose an online knowledge learning method for crowd counting. Our method builds an end-to-end training framework that integrates two independent networks into a single architecture. Our method achieves comparable performance to state-of-the-art methods despite using far fewer parameters.
arXiv Detail & Related papers (2023-03-18T03:27:57Z)
Learning Knowledge Representation with Meta Knowledge Distillation for Single Image Super-Resolution [82.89021683451432]
We propose a model-agnostic meta knowledge distillation method under the teacher-student architecture for the single image super-resolution task. Experiments conducted on various single image super-resolution datasets demonstrate that our proposed method outperforms existing defined knowledge representation related distillation methods.
arXiv Detail & Related papers (2022-07-18T02:41:04Z)
Alignahead: Online Cross-Layer Knowledge Extraction on Graph Neural Networks [6.8080936803807734]
Existing knowledge distillation methods on graph neural networks (GNNs) are almost offline. We propose a novel online knowledge distillation framework to resolve this problem. We develop a cross-layer distillation strategy by aligning ahead one student layer with the layer in different depth of another student model.
arXiv Detail & Related papers (2022-05-05T06:48:13Z)
Distilling EEG Representations via Capsules for Affective Computing [14.67085109524245]
We propose a novel knowledge distillation pipeline to distill EEG representations via capsule-based architectures. Our framework consistently enables student networks with different compression ratios to effectively learn from the teacher. Our method achieves state-of-the-art results on one of the two datasets.
arXiv Detail & Related papers (2021-04-30T22:04:35Z)
LANA: Towards Personalized Deep Knowledge Tracing Through Distinguishable Interactive Sequences [21.67751919579854]
We propose Leveled Attentive KNowledge TrAcing (LANA) to predict students' responses to future questions. It uses a novel student-related features extractor (SRFE) to distill students' unique inherent properties from their respective interactive sequences. With pivot module reconstructed the decoder for individual students and leveled learning specialized encoders for groups, personalized DKT was achieved.
arXiv Detail & Related papers (2021-04-21T02:57:42Z)
Distilling Knowledge via Knowledge Review [69.15050871776552]
We study the factor of connection path cross levels between teacher and student networks, and reveal its great importance. For the first time in knowledge distillation, cross-stage connection paths are proposed. Our finally designed nested and compact framework requires negligible overhead, and outperforms other methods on a variety of tasks.
arXiv Detail & Related papers (2021-04-19T04:36:24Z)
Knowledge Distillation By Sparse Representation Matching [107.87219371697063]
We propose Sparse Representation Matching (SRM) to transfer intermediate knowledge from one Convolutional Network (CNN) to another by utilizing sparse representation. We formulate as a neural processing block, which can be efficiently optimized using gradient descent and integrated into any CNN in a plug-and-play manner. Our experiments demonstrate that is robust to architectural differences between the teacher and student networks, and outperforms other KD techniques across several datasets.
arXiv Detail & Related papers (2021-03-31T11:47:47Z)
Network-Agnostic Knowledge Transfer for Medical Image Segmentation [2.25146058725705]
We propose a knowledge transfer approach from a teacher to a student network wherein we train the student on an independent transferal dataset. We studied knowledge transfer from a single teacher, combination of knowledge transfer and fine-tuning, and knowledge transfer from multiple teachers. The proposed algorithm is effective for knowledge transfer and easily tunable.
arXiv Detail & Related papers (2021-01-23T19:06:14Z)
Graph-Based Neural Network Models with Multiple Self-Supervised Auxiliary Tasks [79.28094304325116]
Graph Convolutional Networks are among the most promising approaches for capturing relationships among structured data points. We propose three novel self-supervised auxiliary tasks to train graph-based neural network models in a multi-task fashion.
arXiv Detail & Related papers (2020-11-14T11:09:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.