Summarizing, Simplifying, and Synthesizing Medical Evidence Using GPT-3
(with Varying Success)
- URL: http://arxiv.org/abs/2305.06299v2
- Date: Thu, 11 May 2023 15:51:55 GMT
- Title: Summarizing, Simplifying, and Synthesizing Medical Evidence Using GPT-3
(with Varying Success)
- Authors: Chantal Shaib, Millicent L. Li, Sebastian Joseph, Iain J. Marshall,
Junyi Jessy Li, Byron C. Wallace
- Abstract summary: GPT-3 is able to produce high quality summaries of general domain news articles in few- and zero-shot settings.
We enlist domain experts (individuals with medical training) to evaluate summaries of biomedical articles generated by GPT-3, given zero supervision.
- Score: 36.646495151276326
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models, particularly GPT-3, are able to produce high quality
summaries of general domain news articles in few- and zero-shot settings.
However, it is unclear if such models are similarly capable in more
specialized, high-stakes domains such as biomedicine. In this paper, we enlist
domain experts (individuals with medical training) to evaluate summaries of
biomedical articles generated by GPT-3, given zero supervision. We consider
both single- and multi-document settings. In the former, GPT-3 is tasked with
generating regular and plain-language summaries of articles describing
randomized controlled trials; in the latter, we assess the degree to which
GPT-3 is able to \emph{synthesize} evidence reported across a collection of
articles. We design an annotation scheme for evaluating model outputs, with an
emphasis on assessing the factual accuracy of generated summaries. We find that
while GPT-3 is able to summarize and simplify single biomedical articles
faithfully, it struggles to provide accurate aggregations of findings over
multiple documents. We release all data and annotations used in this work.
Related papers
- WisPerMed at BioLaySumm: Adapting Autoregressive Large Language Models for Lay Summarization of Scientific Articles [0.41716369948557463]
This paper details the efforts of the WisPerMed team in the BioLaySumm2024 Shared Task on automatic lay summarization.
Large language models (LLMs), specifically the BioMistral and Llama3 models, were fine-tuned and employed to create lay summaries.
Experiments demonstrated that fine-tuning generally led to the best performance across most evaluated metrics.
arXiv Detail & Related papers (2024-05-20T10:54:47Z) - Optimal path for Biomedical Text Summarization Using Pointer GPT [21.919661430250798]
GPT models have a tendency to generate factual errors, lack context, and oversimplify words.
To address these limitations, we replaced the attention mechanism in the GPT model with a pointer network.
The effectiveness of the Pointer-GPT model was evaluated using the ROUGE score.
arXiv Detail & Related papers (2024-03-22T02:13:23Z) - Evaluating the Factuality of Zero-shot Summarizers Across Varied Domains [60.5207173547769]
We evaluate zero-shot generated summaries across specialized domains including biomedical articles, and legal bills.
We acquire annotations from domain experts to identify inconsistencies in summaries and systematically categorize these errors.
We release all collected annotations to facilitate additional research toward measuring and realizing factually accurate summarization, beyond news articles.
arXiv Detail & Related papers (2024-02-05T20:51:11Z) - Evaluation of ChatGPT on Biomedical Tasks: A Zero-Shot Comparison with
Fine-Tuned Generative Transformers [2.5027382653219155]
ChatGPT is a large language model developed by OpenAI.
This paper aims to evaluate the performance of ChatGPT on various benchmark biomedical tasks.
arXiv Detail & Related papers (2023-06-07T15:11:26Z) - BiomedGPT: A Generalist Vision-Language Foundation Model for Diverse Biomedical Tasks [68.39821375903591]
Generalist AI holds the potential to address limitations due to its versatility in interpreting different data types.
Here, we propose BiomedGPT, the first open-source and lightweight vision-language foundation model.
arXiv Detail & Related papers (2023-05-26T17:14:43Z) - News Summarization and Evaluation in the Era of GPT-3 [73.48220043216087]
We study how GPT-3 compares against fine-tuned models trained on large summarization datasets.
We show that not only do humans overwhelmingly prefer GPT-3 summaries, prompted using only a task description, but these also do not suffer from common dataset-specific issues such as poor factuality.
arXiv Detail & Related papers (2022-09-26T01:04:52Z) - Thinking about GPT-3 In-Context Learning for Biomedical IE? Think Again [24.150464908060112]
We present the first systematic and comprehensive study to compare the few-shot performance of GPT-3 in-context learning with fine-tuning smaller (i.e., BERT-sized) PLMs.
Our results show that GPT-3 still significantly underperforms compared with simply fine-tuning a smaller PLM using the same small training set.
arXiv Detail & Related papers (2022-03-16T05:56:08Z) - Fine-tuning GPT-3 for Russian Text Summarization [77.34726150561087]
This paper showcases ruGPT3 ability to summarize texts, fine-tuning it on the corpora of Russian news with their corresponding human-generated summaries.
We evaluate the resulting texts with a set of metrics, showing that our solution can surpass the state-of-the-art model's performance without additional changes in architecture or loss function.
arXiv Detail & Related papers (2021-08-07T19:01:40Z) - What's New? Summarizing Contributions in Scientific Literature [85.95906677964815]
We introduce a new task of disentangled paper summarization, which seeks to generate separate summaries for the paper contributions and the context of the work.
We extend the S2ORC corpus of academic articles by adding disentangled "contribution" and "context" reference labels.
We propose a comprehensive automatic evaluation protocol which reports the relevance, novelty, and disentanglement of generated outputs.
arXiv Detail & Related papers (2020-11-06T02:23:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.