MedLM: Exploring Language Models for Medical Question Answering Systems
- URL: http://arxiv.org/abs/2401.11389v2
- Date: Wed, 6 Mar 2024 03:26:46 GMT
- Title: MedLM: Exploring Language Models for Medical Question Answering Systems
- Authors: Niraj Yagnik, Jay Jhaveri, Vivek Sharma, Gabriel Pila
- Abstract summary: Large Language Models (LLMs) with their advanced generative capabilities have shown promise in various NLP tasks.
This study aims to compare the performance of general and medical-specific distilled LMs for medical Q&A.
The findings will provide valuable insights into the suitability of different LMs for specific applications in the medical domain.
- Score: 2.84801080855027
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the face of rapidly expanding online medical literature, automated systems
for aggregating and summarizing information are becoming increasingly crucial
for healthcare professionals and patients. Large Language Models (LLMs), with
their advanced generative capabilities, have shown promise in various NLP
tasks, and their potential in the healthcare domain, particularly for
Closed-Book Generative QnA, is significant. However, the performance of these
models in domain-specific tasks such as medical Q&A remains largely unexplored.
This study aims to fill this gap by comparing the performance of general and
medical-specific distilled LMs for medical Q&A. We aim to evaluate the
effectiveness of fine-tuning domain-specific LMs and compare the performance of
different families of Language Models. The study will address critical
questions about these models' reliability, comparative performance, and
effectiveness in the context of medical Q&A. The findings will provide valuable
insights into the suitability of different LMs for specific applications in the
medical domain.
Related papers
- STLLaVA-Med: Self-Training Large Language and Vision Assistant for Medical [58.79671189792399]
STLLaVA-Med is designed to train a policy model capable of auto-generating medical visual instruction data.
We validate the efficacy and data efficiency of STLLaVA-Med across three major medical Visual Question Answering (VQA) benchmarks.
arXiv Detail & Related papers (2024-06-28T15:01:23Z) - M-QALM: A Benchmark to Assess Clinical Reading Comprehension and Knowledge Recall in Large Language Models via Question Answering [14.198330378235632]
We use Multiple Choice and Abstractive Question Answering to conduct a large-scale empirical study on 22 datasets in three generalist and three specialist biomedical sub-domains.
Our multifaceted analysis of the performance of 15 LLMs uncovers success factors such as instruction tuning that lead to improved recall and comprehension.
We show that while recently proposed domain-adapted models may lack adequate knowledge, directly fine-tuning on our collected medical knowledge datasets shows encouraging results.
We complement the quantitative results with a skill-oriented manual error analysis, which reveals a significant gap between the models' capabilities to simply recall necessary knowledge and to integrate it with the presented
arXiv Detail & Related papers (2024-06-06T02:43:21Z) - Automatic Interactive Evaluation for Large Language Models with State Aware Patient Simulator [21.60103376506254]
Large Language Models (LLMs) have demonstrated remarkable proficiency in human interactions.
This paper introduces the Automated Interactive Evaluation (AIE) framework and the State-Aware Patient Simulator (SAPS)
AIE and SAPS provide a dynamic, realistic platform for assessing LLMs through multi-turn doctor-patient simulations.
arXiv Detail & Related papers (2024-03-13T13:04:58Z) - RJUA-MedDQA: A Multimodal Benchmark for Medical Document Question
Answering and Clinical Reasoning [14.366349078707263]
RJUA-MedDQA is a comprehensive benchmark in the field of medical specialization.
This work introduces RJUA-MedDQA, a comprehensive benchmark in the field of medical specialization.
arXiv Detail & Related papers (2024-02-19T06:57:02Z) - Asclepius: A Spectrum Evaluation Benchmark for Medical Multi-Modal Large
Language Models [59.60384461302662]
We introduce Asclepius, a novel benchmark for evaluating Medical Multi-Modal Large Language Models (Med-MLLMs)
Asclepius rigorously and comprehensively assesses model capability in terms of distinct medical specialties and different diagnostic capacities.
We also provide an in-depth analysis of 6 Med-MLLMs and compare them with 5 human specialists.
arXiv Detail & Related papers (2024-02-17T08:04:23Z) - AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator [69.51568871044454]
We introduce textbfAI Hospital, a framework simulating dynamic medical interactions between emphDoctor as player and NPCs.
This setup allows for realistic assessments of LLMs in clinical scenarios.
We develop the Multi-View Medical Evaluation benchmark, utilizing high-quality Chinese medical records and NPCs.
arXiv Detail & Related papers (2024-02-15T06:46:48Z) - Large Language Model Distilling Medication Recommendation Model [61.89754499292561]
We harness the powerful semantic comprehension and input-agnostic characteristics of Large Language Models (LLMs)
Our research aims to transform existing medication recommendation methodologies using LLMs.
To mitigate this, we have developed a feature-level knowledge distillation technique, which transfers the LLM's proficiency to a more compact model.
arXiv Detail & Related papers (2024-02-05T08:25:22Z) - Large language models in healthcare and medical domain: A review [4.456243157307507]
Large language models (LLMs) provide proficient responses to free-text queries.
This review explores the potential of LLMs to amplify the efficiency and effectiveness of diverse healthcare applications.
arXiv Detail & Related papers (2023-12-12T20:54:51Z) - Large Language Models Illuminate a Progressive Pathway to Artificial
Healthcare Assistant: A Review [16.008511195589925]
Large language models (LLMs) have shown promising capabilities in mimicking human-level language comprehension and reasoning.
This paper provides a comprehensive review on the applications and implications of LLMs in medicine.
arXiv Detail & Related papers (2023-11-03T13:51:36Z) - PMC-LLaMA: Towards Building Open-source Language Models for Medicine [62.39105735933138]
Large Language Models (LLMs) have showcased remarkable capabilities in natural language understanding.
LLMs struggle in domains that require precision, such as medical applications, due to their lack of domain-specific knowledge.
We describe the procedure for building a powerful, open-source language model specifically designed for medicine applications, termed as PMC-LLaMA.
arXiv Detail & Related papers (2023-04-27T18:29:05Z) - SPeC: A Soft Prompt-Based Calibration on Performance Variability of
Large Language Model in Clinical Notes Summarization [50.01382938451978]
We introduce a model-agnostic pipeline that employs soft prompts to diminish variance while preserving the advantages of prompt-based summarization.
Experimental findings indicate that our method not only bolsters performance but also effectively curbs variance for various language models.
arXiv Detail & Related papers (2023-03-23T04:47:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.