Fine Tuning Large Language Models for Medicine: The Role and Importance of Direct Preference Optimization
- URL: http://arxiv.org/abs/2409.12741v2
- Date: Fri, 20 Sep 2024 19:30:42 GMT
- Title: Fine Tuning Large Language Models for Medicine: The Role and Importance of Direct Preference Optimization
- Authors: Thomas Savage, Stephen Ma, Abdessalem Boukil, Vishwesh Patel, Ekanath Rangan, Ivan Rodriguez, Jonathan H Chen,
- Abstract summary: Two of the most common methods of fine tuning are Supervised Fine Tuning (SFT) and Direct Preference Optimization (DPO)
We compare the performance of SFT and DPO for five common natural language tasks in medicine.
We find that SFT alone is sufficient for Classification with text data, whereas DPO improves performance for the more complex tasks of Clinical Reasoning, Summarization and Clinical Triage.
- Score: 2.096816583842973
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Model (LLM) fine tuning is underutilized in the field of medicine. Two of the most common methods of fine tuning are Supervised Fine Tuning (SFT) and Direct Preference Optimization (DPO), but there is little guidance informing users when to use either technique. In this investigation, we compare the performance of SFT and DPO for five common natural language tasks in medicine: Classification with text data, Classification with numeric data, Clinical Reasoning, Summarization, and Clinical Triage. We find that SFT alone is sufficient for Classification with text data, whereas DPO improves performance for the more complex tasks of Clinical Reasoning, Summarization and Clinical Triage. Our results establish the role and importance of DPO fine tuning within medicine, and consequently call attention to current software gaps that prevent widespread deployment of this technique.
Related papers
- STLLaVA-Med: Self-Training Large Language and Vision Assistant for Medical Question-Answering [58.79671189792399]
STLLaVA-Med is designed to train a policy model capable of auto-generating medical visual instruction data.
We validate the efficacy and data efficiency of STLLaVA-Med across three major medical Visual Question Answering (VQA) benchmarks.
arXiv Detail & Related papers (2024-06-28T15:01:23Z) - Large Language Model Distilling Medication Recommendation Model [61.89754499292561]
We harness the powerful semantic comprehension and input-agnostic characteristics of Large Language Models (LLMs)
Our research aims to transform existing medication recommendation methodologies using LLMs.
To mitigate this, we have developed a feature-level knowledge distillation technique, which transfers the LLM's proficiency to a more compact model.
arXiv Detail & Related papers (2024-02-05T08:25:22Z) - Enhancing Medical Specialty Assignment to Patients using NLP Techniques [0.0]
We propose an alternative approach that achieves superior performance while being computationally efficient.
Specifically, we utilize keywords to train a deep learning architecture that outperforms a language model pretrained on a large corpus of text.
Our results demonstrate that utilizing keywords for text classification significantly improves classification performance.
arXiv Detail & Related papers (2023-12-09T14:13:45Z) - ChiMed-GPT: A Chinese Medical Large Language Model with Full Training Regime and Better Alignment to Human Preferences [51.66185471742271]
We propose ChiMed-GPT, a benchmark LLM designed explicitly for Chinese medical domain.
ChiMed-GPT undergoes a comprehensive training regime with pre-training, SFT, and RLHF.
We analyze possible biases through prompting ChiMed-GPT to perform attitude scales regarding discrimination of patients.
arXiv Detail & Related papers (2023-11-10T12:25:32Z) - Parameter-Efficient Fine-Tuning of LLaMA for the Clinical Domain [13.912870728383396]
Adapting pretrained language models to novel domains, such as clinical applications, traditionally involves retraining their entire set of parameters.
We propose a two-step PEFT framework and evaluate it in the clinical domain.
arXiv Detail & Related papers (2023-07-06T15:06:41Z) - ClinicalGPT: Large Language Models Finetuned with Diverse Medical Data
and Comprehensive Evaluation [5.690250818139763]
Large language models have exhibited exceptional performance on various Natural Language Processing (NLP) tasks.
Despite these advances, their effectiveness in medical applications is limited, due to challenges such as factual inaccuracies, reasoning abilities, and lack grounding in real-world experience.
We present ClinicalGPT, a language model explicitly designed and optimized for clinical scenarios.
arXiv Detail & Related papers (2023-06-16T16:56:32Z) - An Empirical Analysis of Parameter-Efficient Methods for Debiasing
Pre-Trained Language Models [55.14405248920852]
We conduct experiments with prefix tuning, prompt tuning, and adapter tuning on different language models and bias types to evaluate their debiasing performance.
We find that the parameter-efficient methods are effective in mitigating gender bias, where adapter tuning is consistently the most effective.
We also find that prompt tuning is more suitable for GPT-2 than BERT, and racial and religious bias is less effective when it comes to racial and religious bias.
arXiv Detail & Related papers (2023-06-06T23:56:18Z) - Cross-Lingual Knowledge Transfer for Clinical Phenotyping [55.92262310716537]
We investigate cross-lingual knowledge transfer strategies to execute this task for clinics that do not use the English language.
We evaluate these strategies for a Greek and a Spanish clinic leveraging clinical notes from different clinical domains.
Our results show that using multilingual data overall improves clinical phenotyping models and can compensate for data sparseness.
arXiv Detail & Related papers (2022-08-03T08:33:21Z) - Clinical Prompt Learning with Frozen Language Models [4.077071350659386]
Large but frozen pre-trained language models (PLMs) with prompt learning outperform smaller but fine-tuned models.
We investigated the viability of prompt learning on clinically meaningful decision tasks.
Results are partially in line with the prompt learning literature, with prompt learning able to match or improve on traditional fine-tuning.
arXiv Detail & Related papers (2022-05-11T14:25:13Z) - MeDAL: Medical Abbreviation Disambiguation Dataset for Natural Language
Understanding Pretraining [5.807159674193696]
We present MeDAL, a large medical text dataset curated for abbreviation disambiguation.
We pre-trained several models of common architectures on this dataset and empirically showed that such pre-training leads to improved performance and convergence speed when fine-tuning on downstream medical tasks.
arXiv Detail & Related papers (2020-12-27T17:17:39Z) - An Interpretable End-to-end Fine-tuning Approach for Long Clinical Text [72.62848911347466]
Unstructured clinical text in EHRs contains crucial information for applications including decision support, trial matching, and retrospective research.
Recent work has applied BERT-based models to clinical information extraction and text classification, given these models' state-of-the-art performance in other NLP domains.
In this work, we propose a novel fine-tuning approach called SnipBERT. Instead of using entire notes, SnipBERT identifies crucial snippets and feeds them into a truncated BERT-based model in a hierarchical manner.
arXiv Detail & Related papers (2020-11-12T17:14:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.