ATMS-KD: Adaptive Temperature and Mixed Sample Knowledge Distillation for a Lightweight Residual CNN in Agricultural Embedded Systems
- URL: http://arxiv.org/abs/2508.20232v1
- Date: Wed, 27 Aug 2025 19:23:54 GMT
- Title: ATMS-KD: Adaptive Temperature and Mixed Sample Knowledge Distillation for a Lightweight Residual CNN in Agricultural Embedded Systems
- Authors: Mohamed Ohamouddou, Said Ohamouddou, Abdellatif El Afia, Rafik Lasri,
- Abstract summary: ATMS-KD (Adaptive Temperature and Mixed-Sample Knowledge Distillation) is a novel framework for developing lightweight CNN models.<n>The framework combines adaptive temperature scheduling with mixed-sample augmentation to transfer knowledge from a MobileNetV3 Large teacher model to lightweight residual CNN students.<n>The dataset used in this study consists of images of textitRosa damascena (Damask rose) collected from agricultural fields in the Dades Oasis, southeastern Morocco.
- Score: 0.6299766708197883
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This study proposes ATMS-KD (Adaptive Temperature and Mixed-Sample Knowledge Distillation), a novel framework for developing lightweight CNN models suitable for resource-constrained agricultural environments. The framework combines adaptive temperature scheduling with mixed-sample augmentation to transfer knowledge from a MobileNetV3 Large teacher model (5.7\,M parameters) to lightweight residual CNN students. Three student configurations were evaluated: Compact (1.3\,M parameters), Standard (2.4\,M parameters), and Enhanced (3.8\,M parameters). The dataset used in this study consists of images of \textit{Rosa damascena} (Damask rose) collected from agricultural fields in the Dades Oasis, southeastern Morocco, providing a realistic benchmark for agricultural computer vision applications under diverse environmental conditions. Experimental evaluation on the Damascena rose maturity classification dataset demonstrated significant improvements over direct training methods. All student models achieved validation accuracies exceeding 96.7\% with ATMS-KD compared to 95--96\% with direct training. The framework outperformed eleven established knowledge distillation methods, achieving 97.11\% accuracy with the compact model -- a 1.60 percentage point improvement over the second-best approach while maintaining the lowest inference latency of 72.19\,ms. Knowledge retention rates exceeded 99\% for all configurations, demonstrating effective knowledge transfer regardless of student model capacity.
Related papers
- Pedagogically-Inspired Data Synthesis for Language Model Knowledge Distillation [63.302074484672424]
We propose a pedagogically-inspired framework for knowledge distillation.<n>Our approach identifies knowledge deficiencies in student models, organizes knowledge delivery through progressive curricula, and adapts representations to match cognitive capacity of student models.<n>Our framework particularly excels in complex reasoning tasks, showing 19.2% improvement on MATH and 22.3% on HumanEval compared with state-of-the-art baselines.
arXiv Detail & Related papers (2026-02-12T17:00:36Z) - Multi-objective hybrid knowledge distillation for efficient deep learning in smart agriculture [0.05599792629509228]
This study proposes a hybrid knowledge distillation framework for developing a lightweight yet high-performance convolutional neural network.<n>The proposed approach designs a customized student model that combines inverted residual blocks with dense connectivity and trains it under the guidance of a ResNet18 teacher network.
arXiv Detail & Related papers (2025-12-23T15:33:55Z) - A Domain-Adapted Lightweight Ensemble for Resource-Efficient Few-Shot Plant Disease Classification [0.0]
We present a few-shot learning approach that combines domain-adapted MobileNetV2 and MobileNetV3 models as feature extractors.<n>For the classification task, the fused features are passed through a Bi-LSTM classifier enhanced with attention mechanisms.<n>It consistently improved performance across 1 to 15 shot scenarios, reaching 98.23+-0.33% at 15 shot.<n> Notably, it also outperformed the previous SOTA accuracy of 96.4% on six diseases from PlantVillage, achieving 99.72% with only 15-shot learning.
arXiv Detail & Related papers (2025-12-15T15:17:29Z) - Synthetic Adaptive Guided Embeddings (SAGE): A Novel Knowledge Distillation Method [1.5839621757142595]
We propose a novel adaptive distillation framework that dynamically augments training data in regions of high student model loss.<n>Our method identifies underperforming regions in the embedding space and generates targeted synthetic examples to guide student learning.
arXiv Detail & Related papers (2025-08-20T15:29:00Z) - Temperature-Driven Robust Disease Detection in Brain and Gastrointestinal Disorders via Context-Aware Adaptive Knowledge Distillation [6.432534227472963]
We propose a novel framework that integrates Ant Colony Optimization for optimal teacher-student model selection and a novel context-aware predictor approach for temperature scaling.<n>The proposed framework is evaluated using three publicly available benchmark datasets.
arXiv Detail & Related papers (2025-05-09T19:02:09Z) - Multi Teacher Privileged Knowledge Distillation for Multimodal Expression Recognition [58.41784639847413]
Human emotion is a complex phenomenon conveyed and perceived through facial expressions, vocal tones, body language, and physiological signals.
In this paper, a multi-teacher PKD (MT-PKDOT) method with self-distillation is introduced to align diverse teacher representations before distilling them to the student.
Results indicate that our proposed method can outperform SOTA PKD methods.
arXiv Detail & Related papers (2024-08-16T22:11:01Z) - Bayes Conditional Distribution Estimation for Knowledge Distillation
Based on Conditional Mutual Information [3.84949625314596]
We introduce the concept of conditional mutual information (CMI) into the estimation of Bayes conditional probability distribution (BCPD)
In MCMI estimation, both the log-likelihood and CMI of the teacher are simultaneously maximized when the teacher is trained.
We show that such improvements in the student's accuracy are more drastic in zero-shot and few-shot settings.
arXiv Detail & Related papers (2024-01-16T16:01:37Z) - One-for-All: Bridge the Gap Between Heterogeneous Architectures in
Knowledge Distillation [69.65734716679925]
Knowledge distillation has proven to be a highly effective approach for enhancing model performance through a teacher-student training scheme.
Most existing distillation methods are designed under the assumption that the teacher and student models belong to the same model family.
We propose a simple yet effective one-for-all KD framework called OFA-KD, which significantly improves the distillation performance between heterogeneous architectures.
arXiv Detail & Related papers (2023-10-30T11:13:02Z) - Alexa Teacher Model: Pretraining and Distilling Multi-Billion-Parameter
Encoders for Natural Language Understanding Systems [63.713297451300086]
We present results from a large-scale experiment on pretraining encoders with non-embedding parameter counts ranging from 700M to 9.3B.
Their subsequent distillation into smaller models ranging from 17M-170M parameters, and their application to the Natural Language Understanding (NLU) component of a virtual assistant system.
arXiv Detail & Related papers (2022-06-15T20:44:23Z) - Parameter-Efficient and Student-Friendly Knowledge Distillation [83.56365548607863]
We present a parameter-efficient and student-friendly knowledge distillation method, namely PESF-KD, to achieve efficient and sufficient knowledge transfer.
Experiments on a variety of benchmarks show that PESF-KD can significantly reduce the training cost while obtaining competitive results compared to advanced online distillation methods.
arXiv Detail & Related papers (2022-05-28T16:11:49Z) - LTD: Low Temperature Distillation for Robust Adversarial Training [1.3300217947936062]
Adversarial training has been widely used to enhance the robustness of neural network models against adversarial attacks.
Despite the popularity of neural network models, a significant gap exists between the natural and robust accuracy of these models.
We propose a novel method called Low Temperature Distillation (LTD) that generates soft labels using the modified knowledge distillation framework.
arXiv Detail & Related papers (2021-11-03T16:26:00Z) - End-to-End Semi-Supervised Object Detection with Soft Teacher [63.26266730447914]
This paper presents an end-to-end semi-supervised object detection approach, in contrast to previous more complex multi-stage methods.
The proposed approach outperforms previous methods by a large margin under various labeling ratios.
On the state-of-the-art Swin Transformer-based object detector, it can still significantly improve the detection accuracy by +1.5 mAP.
arXiv Detail & Related papers (2021-06-16T17:59:30Z) - Extracurricular Learning: Knowledge Transfer Beyond Empirical
Distribution [17.996541285382463]
We propose extracurricular learning to bridge the gap between a compressed student model and its teacher.
We conduct rigorous evaluations on regression and classification tasks and show that compared to the standard knowledge distillation, extracurricular learning reduces the gap by 46% to 68%.
This leads to major accuracy improvements compared to the empirical risk minimization-based training for various recent neural network architectures.
arXiv Detail & Related papers (2020-06-30T18:21:21Z) - Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner
Party Transcription [73.66530509749305]
In this paper, we argue that, even in difficult cases, some end-to-end approaches show performance close to the hybrid baseline.
We experimentally compare and analyze CTC-Attention versus RNN-Transducer approaches along with RNN versus Transformer architectures.
Our best end-to-end model based on RNN-Transducer, together with improved beam search, reaches quality by only 3.8% WER abs. worse than the LF-MMI TDNN-F CHiME-6 Challenge baseline.
arXiv Detail & Related papers (2020-04-22T19:08:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.