Classification of Psychiatry Clinical Notes by Diagnosis: A Deep Learning and Machine Learning Approach
- URL: http://arxiv.org/abs/2508.00695v1
- Date: Fri, 01 Aug 2025 15:11:39 GMT
- Title: Classification of Psychiatry Clinical Notes by Diagnosis: A Deep Learning and Machine Learning Approach
- Authors: Sergio Rubio-Martín, María Teresa García-Ordás, Antonio Serrano-García, Clara Margarita Franch-Pato, Arturo Crespo-Álvaro, José Alberto Benítez-Andrades,
- Abstract summary: Decision Tree and eXtreme Gradient Boost models achieved the highest accuracy among machine learning approaches.<n>DistilBERT and SciBERT models attained 96% accuracy in the deep learning category.
- Score: 1.4019041243188557
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The classification of clinical notes into specific diagnostic categories is critical in healthcare, especially for mental health conditions like Anxiety and Adjustment Disorder. In this study, we compare the performance of various Artificial Intelligence models, including both traditional Machine Learning approaches (Random Forest, Support Vector Machine, K-nearest neighbors, Decision Tree, and eXtreme Gradient Boost) and Deep Learning models (DistilBERT and SciBERT), to classify clinical notes into these two diagnoses. Additionally, we implemented three oversampling strategies: No Oversampling, Random Oversampling, and Synthetic Minority Oversampling Technique (SMOTE), to assess their impact on model performance. Hyperparameter tuning was also applied to optimize model accuracy. Our results indicate that oversampling techniques had minimal impact on model performance overall. The only exception was SMOTE, which showed a positive effect specifically with BERT-based models. However, hyperparameter optimization significantly improved accuracy across the models, enhancing their ability to generalize and perform on the dataset. The Decision Tree and eXtreme Gradient Boost models achieved the highest accuracy among machine learning approaches, both reaching 96%, while the DistilBERT and SciBERT models also attained 96% accuracy in the deep learning category. These findings underscore the importance of hyperparameter tuning in maximizing model performance. This study contributes to the ongoing research on AI-assisted diagnostic tools in mental health by providing insights into the efficacy of different model architectures and data balancing methods.
Related papers
- Comparative Evaluation of Radiomics and Deep Learning Models for Disease Detection in Chest Radiography [0.0]
We evaluate radiomics-based and deep learning-based approaches for disease detection in chest radiography.<n>Deep learning models learn directly from image data, while radiomics-based models extract handcrafted features.<n>These findings provide statistically validated, data-driven recommendations for model selection in diagnostic AI.
arXiv Detail & Related papers (2025-04-16T16:54:37Z) - Supervised Score-Based Modeling by Gradient Boosting [49.556736252628745]
We propose a Supervised Score-based Model (SSM) which can be viewed as a gradient boosting algorithm combining score matching.<n>We provide a theoretical analysis of learning and sampling for SSM to balance inference time and prediction accuracy.<n>Our model outperforms existing models in both accuracy and inference time.
arXiv Detail & Related papers (2024-11-02T07:06:53Z) - Enhanced Prediction of Ventilator-Associated Pneumonia in Patients with Traumatic Brain Injury Using Advanced Machine Learning Techniques [0.0]
Ventilator-associated pneumonia (VAP) in traumatic brain injury (TBI) patients poses a significant mortality risk.
Timely detection and prognostication of VAP in TBI patients are crucial to improve patient outcomes and alleviate the strain on healthcare resources.
We implemented six machine learning models using the MIMIC-III database.
arXiv Detail & Related papers (2024-08-02T09:44:18Z) - Machine Learning for ALSFRS-R Score Prediction: Making Sense of the Sensor Data [44.99833362998488]
Amyotrophic Lateral Sclerosis (ALS) is a rapidly progressive neurodegenerative disease that presents individuals with limited treatment options.
The present investigation, spearheaded by the iDPP@CLEF 2024 challenge, focuses on utilizing sensor-derived data obtained through an app.
arXiv Detail & Related papers (2024-07-10T19:17:23Z) - Unified Uncertainty Estimation for Cognitive Diagnosis Models [70.46998436898205]
We propose a unified uncertainty estimation approach for a wide range of cognitive diagnosis models.
We decompose the uncertainty of diagnostic parameters into data aspect and model aspect.
Our method is effective and can provide useful insights into the uncertainty of cognitive diagnosis.
arXiv Detail & Related papers (2024-03-09T13:48:20Z) - An Evaluation of Machine Learning Approaches for Early Diagnosis of
Autism Spectrum Disorder [0.0]
Autistic Spectrum Disorder (ASD) is a neurological disease characterized by difficulties with social interaction, communication, and repetitive activities.
This study employs diverse machine learning methods to identify crucial ASD traits, aiming to enhance and automate the diagnostic process.
arXiv Detail & Related papers (2023-09-20T21:23:37Z) - The effect of data augmentation and 3D-CNN depth on Alzheimer's Disease
detection [51.697248252191265]
This work summarizes and strictly observes best practices regarding data handling, experimental design, and model evaluation.
We focus on Alzheimer's Disease (AD) detection, which serves as a paradigmatic example of challenging problem in healthcare.
Within this framework, we train predictive 15 models, considering three different data augmentation strategies and five distinct 3D CNN architectures.
arXiv Detail & Related papers (2023-09-13T10:40:41Z) - Comparative Analysis of Epileptic Seizure Prediction: Exploring Diverse
Pre-Processing Techniques and Machine Learning Models [0.0]
We present a comparative analysis of five machine learning models for the prediction of epileptic seizures using EEG data.
The results of our analysis demonstrate the performance of each model in terms of accuracy.
The ET model exhibited the best performance with an accuracy of 99.29%.
arXiv Detail & Related papers (2023-08-06T08:50:08Z) - SSD-KD: A Self-supervised Diverse Knowledge Distillation Method for
Lightweight Skin Lesion Classification Using Dermoscopic Images [62.60956024215873]
Skin cancer is one of the most common types of malignancy, affecting a large population and causing a heavy economic burden worldwide.
Most studies in skin cancer detection keep pursuing high prediction accuracies without considering the limitation of computing resources on portable devices.
This study specifically proposes a novel method, termed SSD-KD, that unifies diverse knowledge into a generic KD framework for skin diseases classification.
arXiv Detail & Related papers (2022-03-22T06:54:29Z) - Development of patients triage algorithm from nationwide COVID-19
registry data based on machine learning [1.0323063834827415]
This paper provides the development processes of the severity assessment model using machine learning techniques.
Model only requires basic patients' basic personal data, allowing for them to judge their own severity.
We aim to establish a medical system that allows patients to check their own severity and informs them to visit the appropriate clinic center based on the past treatment details of other patients with similar severity.
arXiv Detail & Related papers (2021-09-18T19:56:27Z) - A multi-stage machine learning model on diagnosis of esophageal
manometry [50.591267188664666]
The framework includes deep-learning models at the swallow-level stage and feature-based machine learning models at the study-level stage.
This is the first artificial-intelligence-style model to automatically predict CC diagnosis of HRM study from raw multi-swallow data.
arXiv Detail & Related papers (2021-06-25T20:09:23Z) - Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for
Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance.
For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming.
In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.