Related papers: Accelerating Clinical NLP at Scale with a Hybrid Framework with Reduced GPU Demands: A Case Study in Dementia Identification

Accelerating Clinical NLP at Scale with a Hybrid Framework with Reduced GPU Demands: A Case Study in Dementia Identification

URL: http://arxiv.org/abs/2504.12494v1
Date: Wed, 16 Apr 2025 21:24:38 GMT
Title: Accelerating Clinical NLP at Scale with a Hybrid Framework with Reduced GPU Demands: A Case Study in Dementia Identification
Authors: Jianlin Shi, Qiwei Gan, Elizabeth Hanchrow, Annie Bowles, John Stanley, Adam P. Bress, Jordana B. Cohen, Patrick R. Alba,
Abstract summary: We propose a hybrid NLP framework that integrates rule-based filtering, a Support Vector Machine (SVM) classifier, and a BERT-based model.<n>We applied this framework in a dementia identification case study involving 4.9 million veterans with incident hypertension, analyzing 2.1 billion clinical notes.
Score: 0.12369842801624054
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Clinical natural language processing (NLP) is increasingly in demand in both clinical research and operational practice. However, most of the state-of-the-art solutions are transformers-based and require high computational resources, limiting their accessibility. We propose a hybrid NLP framework that integrates rule-based filtering, a Support Vector Machine (SVM) classifier, and a BERT-based model to improve efficiency while maintaining accuracy. We applied this framework in a dementia identification case study involving 4.9 million veterans with incident hypertension, analyzing 2.1 billion clinical notes. At the patient level, our method achieved a precision of 0.90, a recall of 0.84, and an F1-score of 0.87. Additionally, this NLP approach identified over three times as many dementia cases as structured data methods. All processing was completed in approximately two weeks using a single machine with dual A40 GPUs. This study demonstrates the feasibility of hybrid NLP solutions for large-scale clinical text analysis, making state-of-the-art methods more accessible to healthcare organizations with limited computational resources.

Related papers

A Federated and Parameter-Efficient Framework for Large Language Model Training in Medicine [59.78991974851707]
Large language models (LLMs) have demonstrated strong performance on medical benchmarks, including question answering and diagnosis.<n>Most medical LLMs are trained on data from a single institution, which faces limitations in generalizability and safety in heterogeneous systems.<n>We introduce the model-agnostic and parameter-efficient federated learning framework for adapting LLMs to medical applications.
arXiv Detail & Related papers (2026-01-29T18:48:21Z)
Local Obfuscation by GLINER for Impartial Context Aware Lineage: Development and evaluation of PII Removal system [3.823253824850948]
LOGICAL is an efficient, locally deployable PII removal system built on a fine-tuned GLiNER model.<n>Fine-tuned GLiNER model achieved superior performance, with an overall micro-average F1-score of 0.980.<n> LOGICAL correctly sanitised 95% of documents completely, compared to 64% for the next-best solution.
arXiv Detail & Related papers (2025-10-22T08:12:07Z)
Model Compression Engine for Wearable Devices Skin Cancer Diagnosis [0.04818215922729968]
Skin cancer is one of the most prevalent and preventable types of cancer, yet its early detection remains a challenge.<n>This study proposes an AI-driven diagnostic tool optimized for embedded systems to address this gap.
arXiv Detail & Related papers (2025-07-23T02:02:24Z)
Predicting Length of Stay in Neurological ICU Patients Using Classical Machine Learning and Neural Network Models: A Benchmark Study on MIMIC-IV [49.1574468325115]
This study explores multiple ML approaches for predicting LOS in ICU specifically for the patients with neurological diseases based on the MIMIC-IV dataset.<n>The evaluated models include classic ML algorithms (K-Nearest Neighbors, Random Forest, XGBoost and CatBoost) and Neural Networks (LSTM, BERT and Temporal Fusion Transformer)
arXiv Detail & Related papers (2025-05-23T14:06:42Z)
Beyond Negation Detection: Comprehensive Assertion Detection Models for Clinical NLP [5.297964922424743]
We develop state-of-the-art assertion detection models.<n>We evaluate these models against cloud-based commercial API solutions, the legacy rule-based NegEx approach, and GPT-4o.
arXiv Detail & Related papers (2025-03-21T10:18:47Z)
Can Zero-Shot Commercial APIs Deliver Regulatory-Grade Clinical Text DeIdentification? [4.769069757504856]
John Snow Labs' Medical Language Models solution achieves the highest accuracy.<n>It is over 80% cheaper compared to Azure and GPT-4o, and is the only solution not priced by token.
arXiv Detail & Related papers (2025-03-21T10:05:04Z)
Quantifying the Reasoning Abilities of LLMs on Real-world Clinical Cases [48.87360916431396]
We introduce MedR-Bench, a benchmarking dataset of 1,453 structured patient cases, annotated with reasoning references. We propose a framework encompassing three critical examination recommendation, diagnostic decision-making, and treatment planning, simulating the entire patient care journey. Using this benchmark, we evaluate five state-of-the-art reasoning LLMs, including DeepSeek-R1, OpenAI-o3-mini, and Gemini-2.0-Flash Thinking, etc.
arXiv Detail & Related papers (2025-03-06T18:35:39Z)
Efficient Brain Tumor Classification with Lightweight CNN Architecture: A Novel Approach [0.0]
Brain tumor classification using MRI images is critical in medical diagnostics, where early and accurate detection significantly impacts patient outcomes.<n>Recent advancements in deep learning (DL) have shown promise, but many models struggle with balancing accuracy and computational efficiency.<n>We propose a novel model architecture integrating separable convolutions and squeeze and excitation (SE) blocks, designed to enhance feature extraction while maintaining computational efficiency.
arXiv Detail & Related papers (2025-02-01T21:06:42Z)
A Cascaded Dilated Convolution Approach for Mpox Lesion Classification [0.0]
Mpox virus presents significant diagnostic challenges due to its visual similarity to other skin lesion diseases.<n>Deep learning-based approaches for skin lesion classification offer a promising alternative.<n>This study introduces the Cascaded Atrous Group Attention framework to address these challenges.
arXiv Detail & Related papers (2024-12-13T12:47:30Z)
Towards Clinical Practice in CT-Based Pulmonary Disease Screening: An Efficient and Reliable Framework [16.98886836566185]
Cluster-based Sub-Sampling (CSS) method efficiently selects a compact yet comprehensive subset of CT slices.<n>Hybrid Uncertainty Quantification (HUQ) mechanism assesses both Aleatoric Uncertainty (AU) and Epistemic Uncertainty (EU) with minimal computational overhead.
arXiv Detail & Related papers (2024-12-02T14:18:17Z)
Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation [113.5002649181103]
Training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology. For training, we assemble a large dataset of over 697 thousand radiology image-text pairs. For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation. The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications.
arXiv Detail & Related papers (2024-03-12T18:12:02Z)
Deep learning in computed tomography pulmonary angiography imaging: a dual-pronged approach for pulmonary embolism detection [0.0]
The aim of this study is to leverage deep learning techniques to enhance the Computer Assisted Diagnosis (CAD) of Pulmonary Embolism (PE) Our classification system includes an Attention-Guided Convolutional Neural Network (AG-CNN) that uses local context by employing an attention mechanism. AG-CNN achieves robust performance on the FUMPE dataset, achieving an AUROC of 0.927, sensitivity of 0.862, specificity of 0.879, and an F1-score of 0.805 with the Inception-v3 backbone architecture.
arXiv Detail & Related papers (2023-11-09T08:23:44Z)
The Power Of Simplicity: Why Simple Linear Models Outperform Complex Machine Learning Techniques -- Case Of Breast Cancer Diagnosis [0.0]
This research paper investigates the effectiveness of simple linear models versus complex machine learning techniques in breast cancer diagnosis. We focus on Logistic Regression (LR), Decision Trees (DT), and Support Vector Machines (SVM) and optimize their performance using the UCI Machine Learning Repository dataset. Our findings demonstrate that the simpler linear model, LR, outperforms the more complex DT and SVM techniques, with a test score mean of 97.28%, a standard deviation of 1.62%, and a computation time of 35.56 ms.
arXiv Detail & Related papers (2023-06-04T19:43:54Z)
A Meta-GNN approach to personalized seizure detection and classification [53.906130332172324]
We propose a personalized seizure detection and classification framework that quickly adapts to a specific patient from limited seizure samples. We train a Meta-GNN based classifier that learns a global model from a set of training patients. We show that our method outperforms the baselines by reaching 82.7% on accuracy and 82.08% on F1 score after only 20 iterations on new unseen patients.
arXiv Detail & Related papers (2022-11-01T14:12:58Z)
Dynamic Bank Learning for Semi-supervised Federated Image Diagnosis with Class Imbalance [65.61909544178603]
We study a practical yet challenging problem of class imbalanced semi-supervised FL (imFed-Semi) This imFed-Semi problem is addressed by a novel dynamic bank learning scheme, which improves client training by exploiting class proportion information. We evaluate our approach on two public real-world medical datasets, including the intracranial hemorrhage diagnosis with 25,000 CT slices and skin lesion diagnosis with 10,015 dermoscopy images.
arXiv Detail & Related papers (2022-06-27T06:51:48Z)
Auto-FedRL: Federated Hyperparameter Optimization for Multi-institutional Medical Image Segmentation [48.821062916381685]
Federated learning (FL) is a distributed machine learning technique that enables collaborative model training while avoiding explicit data sharing. In this work, we propose an efficient reinforcement learning(RL)-based federated hyperparameter optimization algorithm, termed Auto-FedRL. The effectiveness of the proposed method is validated on a heterogeneous data split of the CIFAR-10 dataset and two real-world medical image segmentation datasets.
arXiv Detail & Related papers (2022-03-12T04:11:42Z)
An Integrated Optimization and Machine Learning Models to Predict the Admission Status of Emergency Patients [1.0323063834827415]
Three machine learning algorithms are proposed: T-XGB, T-ADAB, and T-MLP. The proposed framework can mitigate the crowding problem by proactively planning the patient boarding process. The results show that the newly proposed algorithms resulted in high AUC and outperformed the traditional algorithms.
arXiv Detail & Related papers (2022-02-18T13:50:44Z)
Performance of Dual-Augmented Lagrangian Method and Common Spatial Patterns applied in classification of Motor-Imagery BCI [68.8204255655161]
Motor-imagery based brain-computer interfaces (MI-BCI) have the potential to become ground-breaking technologies for neurorehabilitation. Due to the noisy nature of the used EEG signal, reliable BCI systems require specialized procedures for features optimization and extraction.
arXiv Detail & Related papers (2020-10-13T20:50:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.