CliniDigest: A Case Study in Large Language Model Based Large-Scale
Summarization of Clinical Trial Descriptions
- URL: http://arxiv.org/abs/2307.14522v2
- Date: Mon, 31 Jul 2023 19:00:05 GMT
- Title: CliniDigest: A Case Study in Large Language Model Based Large-Scale
Summarization of Clinical Trial Descriptions
- Authors: Renee D. White (1), Tristan Peng (1), Pann Sripitak (1), Alexander
Rosenberg Johansen (1), Michael Snyder (1) ((1) Stanford University)
- Abstract summary: In 2022, there were on average more than 100 clinical trials submitted to ClinicalTrials.gov every day.
CliniDigest is, to our knowledge, the first tool able to provide real-time, truthful, and comprehensive summaries of clinical trials.
For each field, CliniDigest generates summaries of $mu=153, igma=69 $ words, each of which utilizes $mu=54%, sigma=30% $ of the sources.
- Score: 58.720142291102135
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A clinical trial is a study that evaluates new biomedical interventions. To
design new trials, researchers draw inspiration from those current and
completed. In 2022, there were on average more than 100 clinical trials
submitted to ClinicalTrials.gov every day, with each trial having a mean of
approximately 1500 words [1]. This makes it nearly impossible to keep up to
date. To mitigate this issue, we have created a batch clinical trial summarizer
called CliniDigest using GPT-3.5. CliniDigest is, to our knowledge, the first
tool able to provide real-time, truthful, and comprehensive summaries of
clinical trials. CliniDigest can reduce up to 85 clinical trial descriptions
(approximately 10,500 words) into a concise 200-word summary with references
and limited hallucinations. We have tested CliniDigest on its ability to
summarize 457 trials divided across 27 medical subdomains. For each field,
CliniDigest generates summaries of $\mu=153,\ \sigma=69 $ words, each of which
utilizes $\mu=54\%,\ \sigma=30\% $ of the sources. A more comprehensive
evaluation is planned and outlined in this paper.
Related papers
- Panacea: A foundation model for clinical trial search, summarization, design, and recruitment [29.099676641424384]
We propose a clinical trial foundation model named Panacea.
Panacea is designed to handle multiple tasks, including trial search, trial summarization, trial design, and patient-trial matching.
We also assemble a large-scale dataset, named TrialAlign, of 793,279 trial documents and 1,113,207 trial-related scientific papers.
arXiv Detail & Related papers (2024-06-25T21:29:25Z) - Large Language Models in the Clinic: A Comprehensive Benchmark [63.21278434331952]
We build a benchmark ClinicBench to better understand large language models (LLMs) in the clinic.
We first collect eleven existing datasets covering diverse clinical language generation, understanding, and reasoning tasks.
We then construct six novel datasets and clinical tasks that are complex but common in real-world practice.
We conduct an extensive evaluation of twenty-two LLMs under both zero-shot and few-shot settings.
arXiv Detail & Related papers (2024-04-25T15:51:06Z) - TrialDura: Hierarchical Attention Transformer for Interpretable Clinical Trial Duration Prediction [19.084936647082632]
We propose TrialDura, a machine learning-based method that estimates the duration of clinical trials using multimodal data.
We encode them into Bio-BERT embeddings specifically tuned for biomedical contexts to provide a deeper and more relevant semantic understanding.
Our proposed model demonstrated superior performance with a mean absolute error (MAE) of 1.04 years and a root mean square error (RMSE) of 1.39 years compared to the other models.
arXiv Detail & Related papers (2024-04-20T02:12:59Z) - AutoTrial: Prompting Language Models for Clinical Trial Design [53.630479619856516]
We present a method named AutoTrial to aid the design of clinical eligibility criteria using language models.
Experiments on over 70K clinical trials verify that AutoTrial generates high-quality criteria texts.
arXiv Detail & Related papers (2023-05-19T01:04:16Z) - Exploring Optimal Granularity for Extractive Summarization of
Unstructured Health Records: Analysis of the Largest Multi-Institutional
Archive of Health Records in Japan [25.195233641408233]
"Discharge summaries" are one promising application of the summarization.
It remains unclear how the summaries should be generated from the unstructured source.
This study aimed to identify the optimal granularity in summarization.
arXiv Detail & Related papers (2022-09-20T23:26:02Z) - Trial2Vec: Zero-Shot Clinical Trial Document Similarity Search using
Self-Supervision [42.859662256134584]
We propose Trial2Vec, which learns through self-supervision without annotating similar clinical trials.
meta-structure of trial documents (e.g., title, eligibility criteria, target disease) along with clinical knowledge are leveraged to automatically generate contrastive samples.
We show that our method yields medically interpretable embeddings by visualization and it gets a 15% average improvement over the best baselines on precision/recall for trial retrieval.
arXiv Detail & Related papers (2022-06-29T15:37:11Z) - ITTC @ TREC 2021 Clinical Trials Track [54.141379782822206]
The task focuses on the problem of matching eligible clinical trials to topics constituting a summary of a patient's admission notes.
We explore different ways of representing trials and topics using NLP techniques, and then use a common retrieval model to generate the ranked list of relevant trials for each topic.
The results from all our submitted runs are well above the median scores for all topics, but there is still plenty of scope for improvement.
arXiv Detail & Related papers (2022-02-16T04:56:47Z) - CREATe: Clinical Report Extraction and Annotation Technology [53.731999072534876]
Clinical case reports are written descriptions of the unique aspects of a particular clinical case.
There has been no attempt to develop an end-to-end system to annotate, index, or otherwise curate these reports.
We propose a novel computational resource platform, CREATe, for extracting, indexing, and querying the contents of clinical case reports.
arXiv Detail & Related papers (2021-02-28T16:50:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.