Knowledge Distillation for Real-Time Classification of Early Media in Voice Communications
- URL: http://arxiv.org/abs/2410.21478v1
- Date: Mon, 28 Oct 2024 19:32:17 GMT
- Title: Knowledge Distillation for Real-Time Classification of Early Media in Voice Communications
- Authors: Kemal Altwlkany, Hadžem Hadžić, Amar Kurić, Emanuel Lacic,
- Abstract summary: We propose a novel approach for low-resource requirements based on gradient-boosted trees.
We show that leveraging knowledge distillation and class aggregation techniques to train a simpler and smaller model accelerates the classification of early media in voice calls.
- Score: 0.13124513975412253
- License:
- Abstract: This paper investigates the industrial setting of real-time classification of early media exchanged during the initialization phase of voice calls. We explore the application of state-of-the-art audio tagging models and highlight some limitations when applied to the classification of early media. While most existing approaches leverage convolutional neural networks, we propose a novel approach for low-resource requirements based on gradient-boosted trees. Our approach not only demonstrates a substantial improvement in runtime performance, but also exhibits a comparable accuracy. We show that leveraging knowledge distillation and class aggregation techniques to train a simpler and smaller model accelerates the classification of early media in voice calls. We provide a detailed analysis of the results on a proprietary and publicly available dataset, regarding accuracy and runtime performance. We additionally report a case study of the achieved performance improvements at a regional data center in India.
Related papers
- Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.
We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z) - Understanding and Mitigating the Label Noise in Pre-training on
Downstream Tasks [91.15120211190519]
This paper aims to understand the nature of noise in pre-training datasets and to mitigate its impact on downstream tasks.
We propose a light-weight black-box tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise.
arXiv Detail & Related papers (2023-09-29T06:18:15Z) - Multivariate Time Series Early Classification Across Channel and Time
Dimensions [3.5786621294068373]
We propose a more flexible early classification pipeline that offers a more granular consideration of input channels.
Our method can enhance the early classification paradigm by achieving improved accuracy for equal input utilization.
arXiv Detail & Related papers (2023-06-26T11:30:33Z) - DeCoR: Defy Knowledge Forgetting by Predicting Earlier Audio Codes [16.96483269023065]
Lifelong audio feature extraction involves learning new sound classes incrementally.
optimizing the model only on new data can lead to catastrophic forgetting of previously learned tasks.
This paper introduces a new approach to continual audio representation learning called DeCoR.
arXiv Detail & Related papers (2023-05-29T02:25:03Z) - Learning with Noisy Labels through Learnable Weighting and Centroid Similarity [5.187216033152917]
noisy labels are prevalent in domains such as medical diagnosis and autonomous driving.
We introduce a novel method for training machine learning models in the presence of noisy labels.
Our results show that our method consistently outperforms the existing state-of-the-art techniques.
arXiv Detail & Related papers (2023-03-16T16:43:24Z) - Low-resource Accent Classification in Geographically-proximate Settings:
A Forensic and Sociophonetics Perspective [8.002498051045228]
Accented speech recognition and accent classification are relatively under-explored research areas in speech technology.
Recent deep learning-based methods and Transformer-based pretrained models have achieved superb performances in both areas.
In this paper, we explored three main accent modelling methods combined with two different classifiers based on 105 speaker recordings retrieved from five urban varieties in Northern England.
arXiv Detail & Related papers (2022-06-26T01:25:17Z) - Enhancing the Generalization for Intent Classification and Out-of-Domain
Detection in SLU [70.44344060176952]
Intent classification is a major task in spoken language understanding (SLU)
Recent works have shown that using extra data and labels can improve the OOD detection performance.
This paper proposes to train a model with only IND data while supporting both IND intent classification and OOD detection.
arXiv Detail & Related papers (2021-06-28T08:27:38Z) - Single-Layer Vision Transformers for More Accurate Early Exits with Less
Overhead [88.17413955380262]
We introduce a novel architecture for early exiting based on the vision transformer architecture.
We show that our method works for both classification and regression problems.
We also introduce a novel method for integrating audio and visual modalities within early exits in audiovisual data analysis.
arXiv Detail & Related papers (2021-05-19T13:30:34Z) - Open-set Short Utterance Forensic Speaker Verification using
Teacher-Student Network with Explicit Inductive Bias [59.788358876316295]
We propose a pipeline solution to improve speaker verification on a small actual forensic field dataset.
By leveraging large-scale out-of-domain datasets, a knowledge distillation based objective function is proposed for teacher-student learning.
We show that the proposed objective function can efficiently improve the performance of teacher-student learning on short utterances.
arXiv Detail & Related papers (2020-09-21T00:58:40Z) - Rectified Meta-Learning from Noisy Labels for Robust Image-based Plant
Disease Diagnosis [64.82680813427054]
Plant diseases serve as one of main threats to food security and crop production.
One popular approach is to transform this problem as a leaf image classification task, which can be addressed by the powerful convolutional neural networks (CNNs)
We propose a novel framework that incorporates rectified meta-learning module into common CNN paradigm to train a noise-robust deep network without using extra supervision information.
arXiv Detail & Related papers (2020-03-17T09:51:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.