Related papers: Crowd Counting with Online Knowledge Learning

Crowd Counting with Online Knowledge Learning

URL: http://arxiv.org/abs/2303.10318v1
Date: Sat, 18 Mar 2023 03:27:57 GMT
Title: Crowd Counting with Online Knowledge Learning
Authors: Shengqin Jiang, Bowen Li, Fengna Cheng, Qingshan Liu
Abstract summary: We propose an online knowledge learning method for crowd counting. Our method builds an end-to-end training framework that integrates two independent networks into a single architecture. Our method achieves comparable performance to state-of-the-art methods despite using far fewer parameters.
Score: 23.602652841154164
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Efficient crowd counting models are urgently required for the applications in scenarios with limited computing resources, such as edge computing and mobile devices. A straightforward method to achieve this is knowledge distillation (KD), which involves using a trained teacher network to guide the training of a student network. However, this traditional two-phase training method can be time-consuming, particularly for large datasets, and it is also challenging for the student network to mimic the learning process of the teacher network. To overcome these challenges, we propose an online knowledge learning method for crowd counting. Our method builds an end-to-end training framework that integrates two independent networks into a single architecture, which consists of a shared shallow module, a teacher branch, and a student branch. This approach is more efficient than the two-stage training technique of traditional KD. Moreover, we propose a feature relation distillation method which allows the student branch to more effectively comprehend the evolution of inter-layer features by constructing a new inter-layer relationship matrix. It is combined with response distillation and feature internal distillation to enhance the transfer of mutually complementary information from the teacher branch to the student branch. Extensive experiments on four challenging crowd counting datasets demonstrate the effectiveness of our method which achieves comparable performance to state-of-the-art methods despite using far fewer parameters.

Related papers

Ensemble Learning via Knowledge Transfer for CTR Prediction [9.891226177252653]
In this paper, we investigate larger ensemble networks and find three inherent limitations in commonly used ensemble learning method. We propose a novel model-agnostic Ensemble Knowledge Transfer Framework (EKTF) Experimental results on five real-world datasets demonstrate the effectiveness and compatibility of EKTF.
arXiv Detail & Related papers (2024-11-25T06:14:20Z)
Student-Oriented Teacher Knowledge Refinement for Knowledge Distillation [11.754014876977422]
This paper introduces a novel perspective emphasizing student-oriented and refining the teacher's knowledge to better align with the student's needs. We present the Student-Oriented Knowledge Distillation (SoKD), which incorporates a learnable feature augmentation strategy during training. We also deploy the Distinctive Area Detection Module (DAM) to identify areas of mutual interest between the teacher and student.
arXiv Detail & Related papers (2024-09-27T14:34:08Z)
A Unified Framework for Continual Learning and Machine Unlearning [9.538733681436836]
Continual learning and machine unlearning are crucial challenges in machine learning, typically addressed separately. We introduce a novel framework that jointly tackles both tasks by leveraging controlled knowledge distillation. Our approach enables efficient learning with minimal forgetting and effective targeted unlearning.
arXiv Detail & Related papers (2024-08-21T06:49:59Z)
Direct Distillation between Different Domains [97.39470334253163]
We propose a new one-stage method dubbed Direct Distillation between Different Domains" (4Ds) We first design a learnable adapter based on the Fourier transform to separate the domain-invariant knowledge from the domain-specific knowledge. We then build a fusion-activation mechanism to transfer the valuable domain-invariant knowledge to the student network.
arXiv Detail & Related papers (2024-01-12T02:48:51Z)
Review helps learn better: Temporal Supervised Knowledge Distillation [9.220654594406508]
We find that during the network training, the evolution of feature map follows temporal sequence property. Inspired by this observation, we propose Temporal Supervised Knowledge Distillation Review (TSKD)
arXiv Detail & Related papers (2023-07-03T07:51:08Z)
Learning Knowledge Representation with Meta Knowledge Distillation for Single Image Super-Resolution [82.89021683451432]
We propose a model-agnostic meta knowledge distillation method under the teacher-student architecture for the single image super-resolution task. Experiments conducted on various single image super-resolution datasets demonstrate that our proposed method outperforms existing defined knowledge representation related distillation methods.
arXiv Detail & Related papers (2022-07-18T02:41:04Z)
Improving Ensemble Distillation With Weight Averaging and Diversifying Perturbation [22.87106703794863]
It motivates distilling knowledge from the ensemble teacher into a smaller student network. We propose a weight averaging technique where a student with multipleworks is trained to absorb the functional diversity of ensemble teachers. We also propose a perturbation strategy that seeks inputs from which the diversities of teachers can be better transferred to the student.
arXiv Detail & Related papers (2022-06-30T06:23:03Z)
Augmenting Knowledge Distillation With Peer-To-Peer Mutual Learning For Model Compression [2.538209532048867]
Mutual Learning (ML) provides an alternative strategy where multiple simple student networks benefit from sharing knowledge. We propose a single-teacher, multi-student framework that leverages both KD and ML to achieve better performance.
arXiv Detail & Related papers (2021-10-21T09:59:31Z)
Distilling Knowledge via Knowledge Review [69.15050871776552]
We study the factor of connection path cross levels between teacher and student networks, and reveal its great importance. For the first time in knowledge distillation, cross-stage connection paths are proposed. Our finally designed nested and compact framework requires negligible overhead, and outperforms other methods on a variety of tasks.
arXiv Detail & Related papers (2021-04-19T04:36:24Z)
Student Network Learning via Evolutionary Knowledge Distillation [22.030934154498205]
We propose an evolutionary knowledge distillation approach to improve the transfer effectiveness of teacher knowledge. Instead of a fixed pre-trained teacher, an evolutionary teacher is learned online and consistently transfers intermediate knowledge to supervise student network learning on-the-fly. In this way, the student can simultaneously obtain rich internal knowledge and capture its growth process, leading to effective student network learning.
arXiv Detail & Related papers (2021-03-23T02:07:15Z)
Collaborative Teacher-Student Learning via Multiple Knowledge Transfer [79.45526596053728]
We propose a collaborative teacher-student learning via multiple knowledge transfer (CTSL-MKT) It allows multiple students learn knowledge from both individual instances and instance relations in a collaborative way. The experiments and ablation studies on four image datasets demonstrate that the proposed CTSL-MKT significantly outperforms the state-of-the-art KD methods.
arXiv Detail & Related papers (2021-01-21T07:17:04Z)
A Combinatorial Perspective on Transfer Learning [27.7848044115664]
We study how the learning of modular solutions can allow for effective generalization to both unseen and potentially differently distributed data. Our main postulate is that the combination of task segmentation, modular learning and memory-based ensembling can give rise to generalization on an exponentially growing number of unseen tasks.
arXiv Detail & Related papers (2020-10-23T09:53:31Z)
Point Adversarial Self Mining: A Simple Method for Facial Expression Recognition [79.75964372862279]
We propose Point Adversarial Self Mining (PASM) to improve the recognition accuracy in facial expression recognition. PASM uses a point adversarial attack method and a trained teacher network to locate the most informative position related to the target task. The adaptive learning materials generation and teacher/student update can be conducted more than one time, improving the network capability iteratively.
arXiv Detail & Related papers (2020-08-26T06:39:24Z)
Interactive Knowledge Distillation [79.12866404907506]
We propose an InterActive Knowledge Distillation scheme to leverage the interactive teaching strategy for efficient knowledge distillation. In the distillation process, the interaction between teacher and student networks is implemented by a swapping-in operation. Experiments with typical settings of teacher-student networks demonstrate that the student networks trained by our IAKD achieve better performance than those trained by conventional knowledge distillation methods.
arXiv Detail & Related papers (2020-07-03T03:22:04Z)
Peer Collaborative Learning for Online Knowledge Distillation [69.29602103582782]
Peer Collaborative Learning method integrates online ensembling and network collaboration into a unified framework. Experiments on CIFAR-10, CIFAR-100 and ImageNet show that the proposed method significantly improves the generalisation of various backbone networks.
arXiv Detail & Related papers (2020-06-07T13:21:52Z)
Heterogeneous Knowledge Distillation using Information Flow Modeling [82.83891707250926]
We propose a novel KD method that works by modeling the information flow through the various layers of the teacher model. The proposed method is capable of overcoming the aforementioned limitations by using an appropriate supervision scheme during the different phases of the training process.
arXiv Detail & Related papers (2020-05-02T06:56:56Z)
Efficient Crowd Counting via Structured Knowledge Transfer [122.30417437707759]
Crowd counting is an application-oriented task and its inference efficiency is crucial for real-world applications. We propose a novel Structured Knowledge Transfer framework to generate a lightweight but still highly effective student network. Our models obtain at least 6.5$times$ speed-up on an Nvidia 1080 GPU and even achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-03-23T08:05:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.