Dual Knowledge Distillation for Efficient Sound Event Detection
- URL: http://arxiv.org/abs/2402.02781v1
- Date: Mon, 5 Feb 2024 07:30:32 GMT
- Title: Dual Knowledge Distillation for Efficient Sound Event Detection
- Authors: Yang Xiao, Rohan Kumar Das
- Abstract summary: Sound event detection (SED) is essential for recognizing specific sounds and their temporal locations within acoustic signals.
We introduce a novel framework referred to as dual knowledge distillation for developing efficient SED systems.
- Score: 20.236008919003083
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sound event detection (SED) is essential for recognizing specific sounds and
their temporal locations within acoustic signals. This becomes challenging
particularly for on-device applications, where computational resources are
limited. To address this issue, we introduce a novel framework referred to as
dual knowledge distillation for developing efficient SED systems in this work.
Our proposed dual knowledge distillation commences with temporal-averaging
knowledge distillation (TAKD), utilizing a mean student model derived from the
temporal averaging of the student model's parameters. This allows the student
model to indirectly learn from a pre-trained teacher model, ensuring a stable
knowledge distillation. Subsequently, we introduce embedding-enhanced feature
distillation (EEFD), which involves incorporating an embedding distillation
layer within the student model to bolster contextual learning. On DCASE 2023
Task 4A public evaluation dataset, our proposed SED system with dual knowledge
distillation having merely one-third of the baseline model's parameters,
demonstrates superior performance in terms of PSDS1 and PSDS2. This highlights
the importance of proposed dual knowledge distillation for compact SED systems,
which can be ideal for edge devices.
Related papers
- Teaching with Uncertainty: Unleashing the Potential of Knowledge Distillation in Object Detection [47.0507287491627]
We propose a novel feature-based distillation paradigm with knowledge uncertainty for object detection.
By leveraging the Monte Carlo dropout technique, we introduce knowledge uncertainty into the training process of the student model.
Our method performs effectively during the KD process without requiring intricate structures or extensive computational resources.
arXiv Detail & Related papers (2024-06-11T06:51:02Z) - Learning to Maximize Mutual Information for Chain-of-Thought Distillation [13.660167848386806]
Distilling Step-by-Step(DSS) has demonstrated promise by imbuing smaller models with the superior reasoning capabilities of their larger counterparts.
However, DSS overlooks the intrinsic relationship between the two training tasks, leading to ineffective integration of CoT knowledge with the task of label prediction.
We propose a variational approach to solve this problem using a learning-based method.
arXiv Detail & Related papers (2024-03-05T22:21:45Z) - Learning Lightweight Object Detectors via Multi-Teacher Progressive
Distillation [56.053397775016755]
We propose a sequential approach to knowledge distillation that progressively transfers the knowledge of a set of teacher detectors to a given lightweight student.
To the best of our knowledge, we are the first to successfully distill knowledge from Transformer-based teacher detectors to convolution-based students.
arXiv Detail & Related papers (2023-08-17T17:17:08Z) - Knowledge Diffusion for Distillation [53.908314960324915]
The representation gap between teacher and student is an emerging topic in knowledge distillation (KD)
We state that the essence of these methods is to discard the noisy information and distill the valuable information in the feature.
We propose a novel KD method dubbed DiffKD, to explicitly denoise and match features using diffusion models.
arXiv Detail & Related papers (2023-05-25T04:49:34Z) - DETRDistill: A Universal Knowledge Distillation Framework for
DETR-families [11.9748352746424]
Transformer-based detectors (DETRs) have attracted great attention due to their sparse training paradigm and the removal of post-processing operations.
Knowledge distillation (KD) can be employed to compress the huge model by constructing a universal teacher-student learning framework.
arXiv Detail & Related papers (2022-11-17T13:35:11Z) - Exploring Inconsistent Knowledge Distillation for Object Detection with
Data Augmentation [66.25738680429463]
Knowledge Distillation (KD) for object detection aims to train a compact detector by transferring knowledge from a teacher model.
We propose inconsistent knowledge distillation (IKD) which aims to distill knowledge inherent in the teacher model's counter-intuitive perceptions.
Our method outperforms state-of-the-art KD baselines on one-stage, two-stage and anchor-free object detectors.
arXiv Detail & Related papers (2022-09-20T16:36:28Z) - CMD: Self-supervised 3D Action Representation Learning with Cross-modal
Mutual Distillation [130.08432609780374]
In 3D action recognition, there exists rich complementary information between skeleton modalities.
We propose a new Cross-modal Mutual Distillation (CMD) framework with the following designs.
Our approach outperforms existing self-supervised methods and sets a series of new records.
arXiv Detail & Related papers (2022-08-26T06:06:09Z) - ERNIE-Search: Bridging Cross-Encoder with Dual-Encoder via Self
On-the-fly Distillation for Dense Passage Retrieval [54.54667085792404]
We propose a novel distillation method that significantly advances cross-architecture distillation for dual-encoders.
Our method 1) introduces a self on-the-fly distillation method that can effectively distill late interaction (i.e., ColBERT) to vanilla dual-encoder, and 2) incorporates a cascade distillation process to further improve the performance with a cross-encoder teacher.
arXiv Detail & Related papers (2022-05-18T18:05:13Z) - Adaptive Instance Distillation for Object Detection in Autonomous
Driving [3.236217153362305]
We propose Adaptive Instance Distillation (AID) to selectively impart teacher's knowledge to the student to improve the performance of knowledge distillation.
Our AID is also shown to be useful for self-distillation to improve the teacher model's performance.
arXiv Detail & Related papers (2022-01-26T18:06:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.