SM70: A Large Language Model for Medical Devices
- URL: http://arxiv.org/abs/2312.06974v1
- Date: Tue, 12 Dec 2023 04:25:26 GMT
- Title: SM70: A Large Language Model for Medical Devices
- Authors: Anubhav Bhatti, Surajsinh Parmar, San Lee
- Abstract summary: We introduce SM70, a large language model specifically designed for SpassMed's medical devices under the brand name 'JEE1' (pronounced as G1 and means 'Life')
To fine-tune SM70, we used around 800K data entries from the publicly available dataset MedAlpaca.
The evaluation is conducted across three benchmark datasets - MEDQA - USMLE, PUBMEDQA, and USMLE.
- Score: 0.6906005491572401
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We are introducing SM70, a 70 billion-parameter Large Language Model that is
specifically designed for SpassMed's medical devices under the brand name
'JEE1' (pronounced as G1 and means 'Life'). This large language model provides
more accurate and safe responses to medical-domain questions. To fine-tune
SM70, we used around 800K data entries from the publicly available dataset
MedAlpaca. The Llama2 70B open-sourced model served as the foundation for SM70,
and we employed the QLoRA technique for fine-tuning. The evaluation is
conducted across three benchmark datasets - MEDQA - USMLE, PUBMEDQA, and USMLE
- each representing a unique aspect of medical knowledge and reasoning. The
performance of SM70 is contrasted with other notable LLMs, including Llama2
70B, Clinical Camel 70 (CC70), GPT 3.5, GPT 4, and Med-Palm, to provide a
comparative understanding of its capabilities within the medical domain. Our
results indicate that SM70 outperforms several established models in these
datasets, showcasing its proficiency in handling a range of medical queries,
from fact-based questions derived from PubMed abstracts to complex clinical
decision-making scenarios. The robust performance of SM70, particularly in the
USMLE and PUBMEDQA datasets, suggests its potential as an effective tool in
clinical decision support and medical information retrieval. Despite its
promising results, the paper also acknowledges the areas where SM70 lags behind
the most advanced model, GPT 4, thereby highlighting the need for further
development, especially in tasks demanding extensive medical knowledge and
intricate reasoning.
Related papers
- MedMobile: A mobile-sized language model with expert-level clinical capabilities [0.8246494848934447]
We introduce a parsimonious adaptation of phi-3-mini, MedMobile, a 3.8 billion parameter LM capable of running on a mobile device, for medical applications.
We demonstrate that MedMobile scores 75.7% on the MedQA (USMLE), surpassing the passing mark for physicians (60%), and approaching the scores of models 100 times its size.
arXiv Detail & Related papers (2024-10-11T17:32:59Z) - Biomedical Large Languages Models Seem not to be Superior to Generalist Models on Unseen Medical Data [3.469567586411153]
Large language models (LLMs) have shown potential in biomedical applications, leading to efforts to fine-tune them on domain-specific data.
This study evaluates the performance of biomedically fine-tuned LLMs against their general-purpose counterparts on a variety of clinical tasks.
arXiv Detail & Related papers (2024-08-25T13:36:22Z) - Towards Evaluating and Building Versatile Large Language Models for Medicine [57.49547766838095]
We present MedS-Bench, a benchmark designed to evaluate the performance of large language models (LLMs) in clinical contexts.
MedS-Bench spans 11 high-level clinical tasks, including clinical report summarization, treatment recommendations, diagnosis, named entity recognition, and medical concept explanation.
MedS-Ins comprises 58 medically oriented language corpora, totaling 13.5 million samples across 122 tasks.
arXiv Detail & Related papers (2024-08-22T17:01:34Z) - Assessing The Potential Of Mid-Sized Language Models For Clinical QA [24.116649037975762]
Large language models, such as GPT-4 and Med-PaLM, have shown impressive performance on clinical tasks.
Mid-size models such as BioGPT-large, BioMedLM, LLaMA 2, and Mistral 7B avoid these drawbacks.
This study provides the first head-to-head assessment of open source mid-sized models on clinical tasks.
arXiv Detail & Related papers (2024-04-24T14:32:34Z) - BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text [82.7001841679981]
BioMedLM is a 2.7 billion parameter GPT-style autoregressive model trained exclusively on PubMed abstracts and full articles.
When fine-tuned, BioMedLM can produce strong multiple-choice biomedical question-answering results competitive with larger models.
BioMedLM can also be fine-tuned to produce useful answers to patient questions on medical topics.
arXiv Detail & Related papers (2024-03-27T10:18:21Z) - Apollo: A Lightweight Multilingual Medical LLM towards Democratizing Medical AI to 6B People [68.59917533894608]
We aim to develop medical LLMs across the six most widely spoken languages, encompassing a global population of 6.1 billion.
This effort culminates in the creation of the ApolloCorpora multilingual medical dataset and the XMedBench benchmark.
We will open-source training corpora, code, model weights and evaluation benchmark.
arXiv Detail & Related papers (2024-03-06T11:56:02Z) - Asclepius: A Spectrum Evaluation Benchmark for Medical Multi-Modal Large
Language Models [59.60384461302662]
We introduce Asclepius, a novel benchmark for evaluating Medical Multi-Modal Large Language Models (Med-MLLMs)
Asclepius rigorously and comprehensively assesses model capability in terms of distinct medical specialties and different diagnostic capacities.
We also provide an in-depth analysis of 6 Med-MLLMs and compare them with 5 human specialists.
arXiv Detail & Related papers (2024-02-17T08:04:23Z) - MEDITRON-70B: Scaling Medical Pretraining for Large Language Models [91.25119823784705]
Large language models (LLMs) can potentially democratize access to medical knowledge.
We release MEDITRON: a suite of open-source LLMs with 7B and 70B parameters adapted to the medical domain.
arXiv Detail & Related papers (2023-11-27T18:49:43Z) - Towards Expert-Level Medical Question Answering with Large Language
Models [16.882775912583355]
Large language models (LLMs) have catalyzed significant progress in medical question answering.
Here we present MedPaLM 2, which bridges gaps by leveraging combination of base improvements (PaLM 2), medical domain fine improvements, and prompting strategies.
We also observed approaching or exceeding state-the-art across MedMC-ofQA, PubMed, MMLU clinical topics datasets.
arXiv Detail & Related papers (2023-05-16T17:11:29Z) - PMC-LLaMA: Towards Building Open-source Language Models for Medicine [62.39105735933138]
Large Language Models (LLMs) have showcased remarkable capabilities in natural language understanding.
LLMs struggle in domains that require precision, such as medical applications, due to their lack of domain-specific knowledge.
We describe the procedure for building a powerful, open-source language model specifically designed for medicine applications, termed as PMC-LLaMA.
arXiv Detail & Related papers (2023-04-27T18:29:05Z) - Large Language Models Encode Clinical Knowledge [21.630872464930587]
Large language models (LLMs) have demonstrated impressive capabilities in natural language understanding and generation.
We propose a framework for human evaluation of model answers along multiple axes including factuality, precision, possible harm, and bias.
We show that comprehension, recall of knowledge, and medical reasoning improve with model scale and instruction prompt tuning.
arXiv Detail & Related papers (2022-12-26T14:28:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.