Larger models yield better results? Streamlined severity classification of ADHD-related concerns using BERT-based knowledge distillation
- URL: http://arxiv.org/abs/2411.00052v1
- Date: Wed, 30 Oct 2024 17:57:44 GMT
- Title: Larger models yield better results? Streamlined severity classification of ADHD-related concerns using BERT-based knowledge distillation
- Authors: Ahmed Akib Jawad Karim, Kazi Hafiz Md. Asad, Md. Golam Rabiul Alam,
- Abstract summary: We create a lightweight yet powerful BERT based model for natural language processing applications.
We apply the resulting model, LastBERT, to a real-world task classifying severity levels of Attention Deficit Hyperactivity Disorder (ADHD)-related concerns from social media text data.
- Score: 0.6793286055326242
- License:
- Abstract: This work focuses on the efficiency of the knowledge distillation approach in generating a lightweight yet powerful BERT based model for natural language processing applications. After the model creation, we applied the resulting model, LastBERT, to a real-world task classifying severity levels of Attention Deficit Hyperactivity Disorder (ADHD)-related concerns from social media text data. Referring to LastBERT, a customized student BERT model, we significantly lowered model parameters from 110 million BERT base to 29 million, resulting in a model approximately 73.64% smaller. On the GLUE benchmark, comprising paraphrase identification, sentiment analysis, and text classification, the student model maintained strong performance across many tasks despite this reduction. The model was also used on a real-world ADHD dataset with an accuracy and F1 score of 85%. When compared to DistilBERT (66M) and ClinicalBERT (110M), LastBERT demonstrated comparable performance, with DistilBERT slightly outperforming it at 87%, and ClinicalBERT achieving 86% across the same metrics. These findings highlight the LastBERT model's capacity to classify degrees of ADHD severity properly, so it offers a useful tool for mental health professionals to assess and comprehend material produced by users on social networking platforms. The study emphasizes the possibilities of knowledge distillation to produce effective models fit for use in resource-limited conditions, hence advancing NLP and mental health diagnosis. Furthermore underlined by the considerable decrease in model size without appreciable performance loss is the lower computational resources needed for training and deployment, hence facilitating greater applicability. Especially using readily available computational tools like Google Colab. This study shows the accessibility and usefulness of advanced NLP methods in pragmatic world applications.
Related papers
- A Comparative Study of Hybrid Models in Health Misinformation Text Classification [0.43695508295565777]
This study evaluates the effectiveness of machine learning (ML) and deep learning (DL) models in detecting COVID-19-related misinformation on online social networks (OSNs)
Our study concludes that DL and hybrid DL models are more effective than conventional ML algorithms for detecting COVID-19 misinformation on OSNs.
arXiv Detail & Related papers (2024-10-08T19:43:37Z) - Enhancing Training Data Attribution for Large Language Models with Fitting Error Consideration [74.09687562334682]
We introduce a novel training data attribution method called Debias and Denoise Attribution (DDA)
Our method significantly outperforms existing approaches, achieving an averaged AUC of 91.64%.
DDA exhibits strong generality and scalability across various sources and different-scale models like LLaMA2, QWEN2, and Mistral.
arXiv Detail & Related papers (2024-10-02T07:14:26Z) - Knowledge Distillation of LLM for Automatic Scoring of Science Education Assessments [4.541309099803903]
This study proposes a method for knowledge distillation (KD) of fine-tuned Large Language Models (LLMs)
We specifically target the challenge of deploying these models on resource-constrained devices.
arXiv Detail & Related papers (2023-12-26T01:24:25Z) - Which Augmentation Should I Use? An Empirical Investigation of Augmentations for Self-Supervised Phonocardiogram Representation Learning [5.438725298163702]
Contrastive Self-Supervised Learning (SSL) offers a potential solution to labeled data scarcity.
We propose uncovering the optimal augmentations for applying contrastive learning in 1D phonocardiogram (PCG) classification.
We demonstrate that depending on its training distribution, the effectiveness of a fully-supervised model can degrade up to 32%, while SSL models only lose up to 10% or even improve in some cases.
arXiv Detail & Related papers (2023-12-01T11:06:00Z) - A Few-Shot Approach to Dysarthric Speech Intelligibility Level
Classification Using Transformers [0.0]
Dysarthria is a speech disorder that hinders communication due to difficulties in articulating words.
Much of the literature focused on improving ASR systems for dysarthric speech.
This work aims to develop models that can accurately classify the presence of dysarthria.
arXiv Detail & Related papers (2023-09-17T17:23:41Z) - Does Synthetic Data Generation of LLMs Help Clinical Text Mining? [51.205078179427645]
We investigate the potential of OpenAI's ChatGPT to aid in clinical text mining.
We propose a new training paradigm that involves generating a vast quantity of high-quality synthetic data.
Our method has resulted in significant improvements in the performance of downstream tasks.
arXiv Detail & Related papers (2023-03-08T03:56:31Z) - SSD-KD: A Self-supervised Diverse Knowledge Distillation Method for
Lightweight Skin Lesion Classification Using Dermoscopic Images [62.60956024215873]
Skin cancer is one of the most common types of malignancy, affecting a large population and causing a heavy economic burden worldwide.
Most studies in skin cancer detection keep pursuing high prediction accuracies without considering the limitation of computing resources on portable devices.
This study specifically proposes a novel method, termed SSD-KD, that unifies diverse knowledge into a generic KD framework for skin diseases classification.
arXiv Detail & Related papers (2022-03-22T06:54:29Z) - Fine-Tuning Large Neural Language Models for Biomedical Natural Language
Processing [55.52858954615655]
We conduct a systematic study on fine-tuning stability in biomedical NLP.
We show that finetuning performance may be sensitive to pretraining settings, especially in low-resource domains.
We show that these techniques can substantially improve fine-tuning performance for lowresource biomedical NLP applications.
arXiv Detail & Related papers (2021-12-15T04:20:35Z) - Efficient training of lightweight neural networks using Online
Self-Acquired Knowledge Distillation [51.66271681532262]
Online Self-Acquired Knowledge Distillation (OSAKD) is proposed, aiming to improve the performance of any deep neural model in an online manner.
We utilize k-nn non-parametric density estimation technique for estimating the unknown probability distributions of the data samples in the output feature space.
arXiv Detail & Related papers (2021-08-26T14:01:04Z) - RefBERT: Compressing BERT by Referencing to Pre-computed Representations [19.807272592342148]
RefBERT can beat the vanilla TinyBERT over 8.1% and achieves more than 94% of the performance of $BERTBASE$ on the GLUE benchmark.
RefBERT is 7.4x smaller and 9.5x faster on inference than BERT$_rm BASE$.
arXiv Detail & Related papers (2021-06-11T01:22:08Z) - MixKD: Towards Efficient Distillation of Large-scale Language Models [129.73786264834894]
We propose MixKD, a data-agnostic distillation framework, to endow the resulting model with stronger generalization ability.
We prove from a theoretical perspective that under reasonable conditions MixKD gives rise to a smaller gap between the error and the empirical error.
Experiments under a limited-data setting and ablation studies further demonstrate the advantages of the proposed approach.
arXiv Detail & Related papers (2020-11-01T18:47:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.