From RAGs to riches: Using large language models to write documents for
clinical trials
- URL: http://arxiv.org/abs/2402.16406v1
- Date: Mon, 26 Feb 2024 08:59:05 GMT
- Title: From RAGs to riches: Using large language models to write documents for
clinical trials
- Authors: Nigel Markey, Ilyass El-Mansouri, Gaetan Rensonnet, Casper van Langen,
Christoph Meier
- Abstract summary: Large language models (LLMs) offer the potential to rapidly generate first versions of clinical trial documents.
We report an evaluation of LLMs in generating parts of one such document, clinical trial protocols.
To improve performance, we used retrieval-augmented generation (RAG) to prompt an LLM with accurate up-to-date information.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Clinical trials require numerous documents to be written -- protocols,
consent forms, clinical study reports and others. Large language models (LLMs)
offer the potential to rapidly generate first versions of these documents,
however there are concerns about the quality of their output Here we report an
evaluation of LLMs in generating parts of one such document, clinical trial
protocols. We find that an offthe-shelf LLM delivers reasonable results,
especially when assessing content relevance and the correct use of terminology.
However, deficiencies remain: specifically clinical thinking and logic, and
appropriate use of references. To improve performance, we used
retrieval-augmented generation (RAG) to prompt an LLM with accurate up-to-date
information. As a result of using RAG, the writing quality of the LLM improves
substantially, which has implications for the practical useability of LLMs in
clinical trial-related writing.
Related papers
- CliMedBench: A Large-Scale Chinese Benchmark for Evaluating Medical Large Language Models in Clinical Scenarios [50.032101237019205]
CliMedBench is a comprehensive benchmark with 14 expert-guided core clinical scenarios.
The reliability of this benchmark has been confirmed in several ways.
arXiv Detail & Related papers (2024-10-04T15:15:36Z) - MEDIC: Towards a Comprehensive Framework for Evaluating LLMs in Clinical Applications [2.838746648891565]
We introduce MEDIC, a framework assessing Large Language Models (LLMs) across five critical dimensions of clinical competence.
We apply MEDIC to evaluate LLMs on medical question-answering, safety, summarization, note generation, and other tasks.
Results show performance disparities across model sizes, baseline vs medically finetuned models, and have implications on model selection for applications requiring specific model strengths.
arXiv Detail & Related papers (2024-09-11T14:44:51Z) - XAI4LLM. Let Machine Learning Models and LLMs Collaborate for Enhanced In-Context Learning in Healthcare [16.79952669254101]
We develop a novel method for zero-shot/few-shot in-context learning (ICL) using a multi-layered structured prompt.
We also explore the efficacy of two communication styles between the user and Large Language Models (LLMs)
Our study systematically evaluates the diagnostic accuracy and risk factors, including gender bias and false negative rates.
arXiv Detail & Related papers (2024-05-10T06:52:44Z) - Large Language Models in the Clinic: A Comprehensive Benchmark [63.21278434331952]
We build a benchmark ClinicBench to better understand large language models (LLMs) in the clinic.
We first collect eleven existing datasets covering diverse clinical language generation, understanding, and reasoning tasks.
We then construct six novel datasets and clinical tasks that are complex but common in real-world practice.
We conduct an extensive evaluation of twenty-two LLMs under both zero-shot and few-shot settings.
arXiv Detail & Related papers (2024-04-25T15:51:06Z) - A Dataset and Benchmark for Hospital Course Summarization with Adapted Large Language Models [4.091402760759184]
Large language models (LLMs) depict remarkable capabilities in automating real-world tasks, but their capabilities for healthcare applications have not been shown.
We introduce a novel pre-processed dataset, the MIMIC-IV-BHC, encapsulating clinical note and brief hospital course (BHC) pairs to adapt LLMs for BHC.
Using clinical notes as input, we apply prompting-based (using in-context learning) and fine-tuning-based adaptation strategies to three open-source LLMs and two proprietary LLMs.
arXiv Detail & Related papers (2024-03-08T23:17:55Z) - Large Language Model Distilling Medication Recommendation Model [61.89754499292561]
We harness the powerful semantic comprehension and input-agnostic characteristics of Large Language Models (LLMs)
Our research aims to transform existing medication recommendation methodologies using LLMs.
To mitigate this, we have developed a feature-level knowledge distillation technique, which transfers the LLM's proficiency to a more compact model.
arXiv Detail & Related papers (2024-02-05T08:25:22Z) - LongHealth: A Question Answering Benchmark with Long Clinical Documents [36.05587855811346]
We present the LongHealth benchmark, comprising 20 detailed fictional patient cases across various diseases.
The benchmark challenges LLMs with 400 multiple-choice questions in three categories: information extraction, negation, and sorting.
We evaluated nine open-source LLMs with a minimum of 16,000 tokens and also included OpenAI's proprietary and cost-efficient GPT-3.5 Turbo for comparison.
arXiv Detail & Related papers (2024-01-25T19:57:00Z) - Adapted Large Language Models Can Outperform Medical Experts in Clinical Text Summarization [8.456700096020601]
Large language models (LLMs) have shown promise in natural language processing (NLP), but their effectiveness on a diverse range of clinical summarization tasks remains unproven.
In this study, we apply adaptation methods to eight LLMs, spanning four distinct clinical summarization tasks.
A clinical reader study with ten physicians evaluates summary, completeness, correctness, and conciseness; in a majority of cases, summaries from our best adapted LLMs are either equivalent (45%) or superior (36%) compared to summaries from medical experts.
arXiv Detail & Related papers (2023-09-14T05:15:01Z) - CliniDigest: A Case Study in Large Language Model Based Large-Scale
Summarization of Clinical Trial Descriptions [58.720142291102135]
In 2022, there were on average more than 100 clinical trials submitted to ClinicalTrials.gov every day.
CliniDigest is, to our knowledge, the first tool able to provide real-time, truthful, and comprehensive summaries of clinical trials.
For each field, CliniDigest generates summaries of $mu=153, igma=69 $ words, each of which utilizes $mu=54%, sigma=30% $ of the sources.
arXiv Detail & Related papers (2023-07-26T21:49:14Z) - Self-Verification Improves Few-Shot Clinical Information Extraction [73.6905567014859]
Large language models (LLMs) have shown the potential to accelerate clinical curation via few-shot in-context learning.
They still struggle with issues regarding accuracy and interpretability, especially in mission-critical domains such as health.
Here, we explore a general mitigation framework using self-verification, which leverages the LLM to provide provenance for its own extraction and check its own outputs.
arXiv Detail & Related papers (2023-05-30T22:05:11Z) - HuatuoGPT, towards Taming Language Model to Be a Doctor [67.96794664218318]
HuatuoGPT is a large language model (LLM) for medical consultation.
We leverage both textitdistilled data from ChatGPT and textitreal-world data from doctors in the supervised fine-tuned stage.
arXiv Detail & Related papers (2023-05-24T11:56:01Z) - Development and validation of a natural language processing algorithm to
pseudonymize documents in the context of a clinical data warehouse [53.797797404164946]
The study highlights the difficulties faced in sharing tools and resources in this domain.
We annotated a corpus of clinical documents according to 12 types of identifying entities.
We build a hybrid system, merging the results of a deep learning model as well as manual rules.
arXiv Detail & Related papers (2023-03-23T17:17:46Z) - Retrieval-Augmented and Knowledge-Grounded Language Models for Faithful Clinical Medicine [68.7814360102644]
We propose the Re$3$Writer method with retrieval-augmented generation and knowledge-grounded reasoning.
We demonstrate the effectiveness of our method in generating patient discharge instructions.
arXiv Detail & Related papers (2022-10-23T16:34:39Z) - ICDBigBird: A Contextual Embedding Model for ICD Code Classification [71.58299917476195]
Contextual word embedding models have achieved state-of-the-art results in multiple NLP tasks.
ICDBigBird is a BigBird-based model which can integrate a Graph Convolutional Network (GCN)
Our experiments on a real-world clinical dataset demonstrate the effectiveness of our BigBird-based model on the ICD classification task.
arXiv Detail & Related papers (2022-04-21T20:59:56Z) - Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of
Code-Mixed Clinical Texts [56.72488923420374]
Pre-trained language models (LMs) have shown great potential for cross-lingual transfer in low-resource settings.
We show the few-shot cross-lingual transfer property of LMs for named recognition (NER) and apply it to solve a low-resource and real-world challenge of code-mixed (Spanish-Catalan) clinical notes de-identification in the stroke.
arXiv Detail & Related papers (2022-04-10T21:46:52Z) - Neural language models for text classification in evidence-based
medicine [3.5770353345663044]
Evidence-based medicine (EBM) is being challenged as never before due to the high volume of research articles published and pre-prints posted daily.
In this article, we report the results of an applied research project to classify scientific articles to support Epistemonikos.
We test several methods, and the best one, based on the XLNet neural language model, improves the current approach by 93% on average F1-score.
arXiv Detail & Related papers (2020-12-01T15:53:44Z) - A Multilingual Neural Machine Translation Model for Biomedical Data [84.17747489525794]
We release a multilingual neural machine translation model, which can be used to translate text in the biomedical domain.
The model can translate from 5 languages (French, German, Italian, Korean and Spanish) into English.
It is trained with large amounts of generic and biomedical data, using domain tags.
arXiv Detail & Related papers (2020-08-06T21:26:43Z) - GGPONC: A Corpus of German Medical Text with Rich Metadata Based on
Clinical Practice Guidelines [4.370297546680015]
GGPONC is a freely distributable German language corpus based on clinical practice guidelines for oncology.
GGPONC is the first corpus for the German language covering diverse conditions in a large medical subfield.
By applying and evaluating existing medical information extraction pipelines for German text, we are able to draw comparisons for the use of medical language.
arXiv Detail & Related papers (2020-07-13T14:25:49Z) - Evidence Inference 2.0: More Data, Better Models [22.53884716373888]
The Evidence Inference dataset was recently released to facilitate research toward this end.
This paper collects additional annotations to expand the Evidence Inference dataset by 25%.
The updated corpus, documentation, and code for new baselines and evaluations are available at http://evidence-inference.ebm-nlp.com/.
arXiv Detail & Related papers (2020-05-08T17:16:35Z) - CLARA: Clinical Report Auto-completion [56.206459591367405]
CLinicit Al it Report it Auto-completion (CLARA) is an interactive method that generates reports in a sentence by sentence fashion based on doctors' anchor words and partially completed sentences.
In our experimental evaluation, CLARA achieved 0.393 CIDEr and 0.248 BLEU-4 on X-ray reports and 0.482 CIDEr and 0.491 BLEU-4 for EEG reports for sentence-level generation.
arXiv Detail & Related papers (2020-02-26T18:45:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.