Medicine on the Edge: Comparative Performance Analysis of On-Device LLMs for Clinical Reasoning
- URL: http://arxiv.org/abs/2502.08954v1
- Date: Thu, 13 Feb 2025 04:35:55 GMT
- Title: Medicine on the Edge: Comparative Performance Analysis of On-Device LLMs for Clinical Reasoning
- Authors: Leon Nissen, Philipp Zagar, Vishnu Ravi, Aydin Zahedivash, Lara Marie Reimer, Stephan Jonas, Oliver Aalami, Paul Schmiedmayer,
- Abstract summary: We benchmark publicly available on-device Large Language Models (LLM) using the AMEGA dataset.
Our results indicate that compact general-purpose models like Phi-3 Mini achieve a strong balance between speed and accuracy.
We emphasize the need for more efficient inference and models tailored to real-world clinical reasoning.
- Score: 1.6010529993238123
- License:
- Abstract: The deployment of Large Language Models (LLM) on mobile devices offers significant potential for medical applications, enhancing privacy, security, and cost-efficiency by eliminating reliance on cloud-based services and keeping sensitive health data local. However, the performance and accuracy of on-device LLMs in real-world medical contexts remain underexplored. In this study, we benchmark publicly available on-device LLMs using the AMEGA dataset, evaluating accuracy, computational efficiency, and thermal limitation across various mobile devices. Our results indicate that compact general-purpose models like Phi-3 Mini achieve a strong balance between speed and accuracy, while medically fine-tuned models such as Med42 and Aloe attain the highest accuracy. Notably, deploying LLMs on older devices remains feasible, with memory constraints posing a greater challenge than raw processing power. Our study underscores the potential of on-device LLMs for healthcare while emphasizing the need for more efficient inference and models tailored to real-world clinical reasoning.
Related papers
- Efficient and Personalized Mobile Health Event Prediction via Small Language Models [14.032049217103024]
Small Language Models (SLMs) are potential candidates to solve privacy and computational issues.
This paper examines the capability of SLMs to accurately analyze health data, such as steps, calories, sleep minutes, and other vital statistics.
Our results indicate that SLMs could potentially be deployed on wearable or mobile devices for real-time health monitoring.
arXiv Detail & Related papers (2024-09-17T01:57:57Z) - MEDIC: Towards a Comprehensive Framework for Evaluating LLMs in Clinical Applications [2.838746648891565]
We introduce MEDIC, a framework assessing Large Language Models (LLMs) across five critical dimensions of clinical competence.
We apply MEDIC to evaluate LLMs on medical question-answering, safety, summarization, note generation, and other tasks.
Results show performance disparities across model sizes, baseline vs medically finetuned models, and have implications on model selection for applications requiring specific model strengths.
arXiv Detail & Related papers (2024-09-11T14:44:51Z) - MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases [81.70591346986582]
We introduce MobileAIBench, a benchmarking framework for evaluating Large Language Models (LLMs) and Large Multimodal Models (LMMs) on mobile devices.
MobileAIBench assesses models across different sizes, quantization levels, and tasks, measuring latency and resource consumption on real devices.
arXiv Detail & Related papers (2024-06-12T22:58:12Z) - AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator [69.51568871044454]
We introduce textbfAI Hospital, a framework simulating dynamic medical interactions between emphDoctor as player and NPCs.
This setup allows for realistic assessments of LLMs in clinical scenarios.
We develop the Multi-View Medical Evaluation benchmark, utilizing high-quality Chinese medical records and NPCs.
arXiv Detail & Related papers (2024-02-15T06:46:48Z) - Large Language Model Distilling Medication Recommendation Model [58.94186280631342]
We harness the powerful semantic comprehension and input-agnostic characteristics of Large Language Models (LLMs)
Our research aims to transform existing medication recommendation methodologies using LLMs.
To mitigate this, we have developed a feature-level knowledge distillation technique, which transfers the LLM's proficiency to a more compact model.
arXiv Detail & Related papers (2024-02-05T08:25:22Z) - Filling the Missing: Exploring Generative AI for Enhanced Federated
Learning over Heterogeneous Mobile Edge Devices [72.61177465035031]
We propose a generative AI-empowered federated learning to address these challenges by leveraging the idea of FIlling the MIssing (FIMI) portion of local data.
Experiment results demonstrate that FIMI can save up to 50% of the device-side energy to achieve the target global test accuracy.
arXiv Detail & Related papers (2023-10-21T12:07:04Z) - Redefining Digital Health Interfaces with Large Language Models [69.02059202720073]
Large Language Models (LLMs) have emerged as general-purpose models with the ability to process complex information.
We show how LLMs can provide a novel interface between clinicians and digital technologies.
We develop a new prognostic tool using automated machine learning.
arXiv Detail & Related papers (2023-10-05T14:18:40Z) - Mixed-Integer Projections for Automated Data Correction of EMRs Improve
Predictions of Sepsis among Hospitalized Patients [7.639610349097473]
We introduce an innovative projections-based method that seamlessly integrates clinical expertise as domain constraints.
We measure the distance of corrected data from the constraints defining a healthy range of patient data, resulting in a unique predictive metric we term as "trust-scores"
We show an AUROC of 0.865 and a precision of 0.922, that surpasses conventional ML models without such projections.
arXiv Detail & Related papers (2023-08-21T15:14:49Z) - Assessing YOLACT++ for real time and robust instance segmentation of
medical instruments in endoscopic procedures [0.5735035463793008]
Image-based tracking of laparoscopic instruments plays a fundamental role in computer and robotic-assisted surgeries.
To date, most of the existing models for instance segmentation of medical instruments were based on two-stage detectors.
We propose the addition of attention mechanisms to the YOLACT architecture that allows real-time instance segmentation of instruments.
arXiv Detail & Related papers (2021-03-30T00:09:55Z) - A Data and Compute Efficient Design for Limited-Resources Deep Learning [68.55415606184]
equivariant neural networks have gained increased interest in the deep learning community.
They have been successfully applied in the medical domain where symmetries in the data can be effectively exploited to build more accurate and robust models.
Mobile, on-device implementations of deep learning solutions have been developed for medical applications.
However, equivariant models are commonly implemented using large and computationally expensive architectures, not suitable to run on mobile devices.
In this work, we design and test an equivariant version of MobileNetV2 and further optimize it with model quantization to enable more efficient inference.
arXiv Detail & Related papers (2020-04-21T00:49:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.