Translating Radiology Reports into Plain Language using ChatGPT and
GPT-4 with Prompt Learning: Promising Results, Limitations, and Potential
- URL: http://arxiv.org/abs/2303.09038v3
- Date: Wed, 29 Mar 2023 03:22:52 GMT
- Title: Translating Radiology Reports into Plain Language using ChatGPT and
GPT-4 with Prompt Learning: Promising Results, Limitations, and Potential
- Authors: Qing Lyu, Josh Tan, Michael E. Zapadka, Janardhana Ponnatapura, Chuang
Niu, Kyle J. Myers, Ge Wang, Christopher T. Whitlow
- Abstract summary: ChatGPT can successfully translate radiology reports into plain language with an average score of 4.27 in the five-point system.
ChatGPT presents some randomness in its responses with occasionally over-simplified or neglected information.
Results are compared with a newly released large model GPT-4, showing that GPT-4 can significantly improve the quality of reports.
- Score: 6.127537348178505
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The large language model called ChatGPT has drawn extensively attention
because of its human-like expression and reasoning abilities. In this study, we
investigate the feasibility of using ChatGPT in experiments on using ChatGPT to
translate radiology reports into plain language for patients and healthcare
providers so that they are educated for improved healthcare. Radiology reports
from 62 low-dose chest CT lung cancer screening scans and 76 brain MRI
metastases screening scans were collected in the first half of February for
this study. According to the evaluation by radiologists, ChatGPT can
successfully translate radiology reports into plain language with an average
score of 4.27 in the five-point system with 0.08 places of information missing
and 0.07 places of misinformation. In terms of the suggestions provided by
ChatGPT, they are general relevant such as keeping following-up with doctors
and closely monitoring any symptoms, and for about 37% of 138 cases in total
ChatGPT offers specific suggestions based on findings in the report. ChatGPT
also presents some randomness in its responses with occasionally
over-simplified or neglected information, which can be mitigated using a more
detailed prompt. Furthermore, ChatGPT results are compared with a newly
released large model GPT-4, showing that GPT-4 can significantly improve the
quality of translated reports. Our results show that it is feasible to utilize
large language models in clinical education, and further efforts are needed to
address limitations and maximize their potential.
Related papers
- Enhancing Medical Support in the Arabic Language Through Personalized ChatGPT Assistance [1.174020933567308]
ChatGPT provides real-time, personalized medical diagnosis at no cost.
The study involved compiling a dataset of disease information and generating multiple messages for each disease.
ChatGPT's performance was assessed by measuring the similarity between its responses and the actual diseases.
arXiv Detail & Related papers (2024-03-21T21:28:07Z) - Evaluation of ChatGPT-Generated Medical Responses: A Systematic Review
and Meta-Analysis [7.587141771901865]
Large language models such as ChatGPT are increasingly explored in medical domains.
This study aims to summarize the available evidence on evaluating ChatGPT's performance in medicine.
arXiv Detail & Related papers (2023-10-12T15:26:26Z) - Evaluating ChatGPT text-mining of clinical records for obesity
monitoring [0.0]
We compare the ability of a large language model (ChatGPT) and a previously developed regular expression (RegexT) to identify overweight anonymised body condition scores (BCS) in veterinary narratives.
arXiv Detail & Related papers (2023-08-03T10:11:42Z) - Is ChatGPT Involved in Texts? Measure the Polish Ratio to Detect
ChatGPT-Generated Text [48.36706154871577]
We introduce a novel dataset termed HPPT (ChatGPT-polished academic abstracts)
It diverges from extant corpora by comprising pairs of human-written and ChatGPT-polished abstracts instead of purely ChatGPT-generated texts.
We also propose the "Polish Ratio" method, an innovative measure of the degree of modification made by ChatGPT compared to the original human-written text.
arXiv Detail & Related papers (2023-07-21T06:38:37Z) - Performance of ChatGPT on USMLE: Unlocking the Potential of Large
Language Models for AI-Assisted Medical Education [0.0]
This study determined how reliable ChatGPT can be for answering complex medical and clinical questions.
The paper evaluated the obtained results using a 2-way ANOVA and posthoc analysis.
ChatGPT-generated answers were found to be more context-oriented than regular Google search results.
arXiv Detail & Related papers (2023-06-30T19:53:23Z) - ChatGPT Beyond English: Towards a Comprehensive Evaluation of Large
Language Models in Multilingual Learning [70.57126720079971]
Large language models (LLMs) have emerged as the most important breakthroughs in natural language processing (NLP)
This paper evaluates ChatGPT on 7 different tasks, covering 37 diverse languages with high, medium, low, and extremely low resources.
Compared to the performance of previous models, our extensive experimental results demonstrate a worse performance of ChatGPT for different NLP tasks and languages.
arXiv Detail & Related papers (2023-04-12T05:08:52Z) - To ChatGPT, or not to ChatGPT: That is the question! [78.407861566006]
This study provides a comprehensive and contemporary assessment of the most recent techniques in ChatGPT detection.
We have curated a benchmark dataset consisting of prompts from ChatGPT and humans, including diverse questions from medical, open Q&A, and finance domains.
Our evaluation results demonstrate that none of the existing methods can effectively detect ChatGPT-generated content.
arXiv Detail & Related papers (2023-04-04T03:04:28Z) - Can ChatGPT Understand Too? A Comparative Study on ChatGPT and
Fine-tuned BERT [103.57103957631067]
ChatGPT has attracted great attention, as it can generate fluent and high-quality responses to human inquiries.
We evaluate ChatGPT's understanding ability by evaluating it on the most popular GLUE benchmark, and comparing it with 4 representative fine-tuned BERT-style models.
We find that: 1) ChatGPT falls short in handling paraphrase and similarity tasks; 2) ChatGPT outperforms all BERT models on inference tasks by a large margin; 3) ChatGPT achieves comparable performance compared with BERT on sentiment analysis and question answering tasks.
arXiv Detail & Related papers (2023-02-19T12:29:33Z) - Is ChatGPT a General-Purpose Natural Language Processing Task Solver? [113.22611481694825]
Large language models (LLMs) have demonstrated the ability to perform a variety of natural language processing (NLP) tasks zero-shot.
Recently, the debut of ChatGPT has drawn a great deal of attention from the natural language processing (NLP) community.
It is not yet known whether ChatGPT can serve as a generalist model that can perform many NLP tasks zero-shot.
arXiv Detail & Related papers (2023-02-08T09:44:51Z) - Is ChatGPT A Good Translator? Yes With GPT-4 As The Engine [97.8609714773255]
We evaluate ChatGPT for machine translation, including translation prompt, multilingual translation, and translation robustness.
ChatGPT performs competitively with commercial translation products but lags behind significantly on low-resource or distant languages.
With the launch of the GPT-4 engine, the translation performance of ChatGPT is significantly boosted.
arXiv Detail & Related papers (2023-01-20T08:51:36Z) - ChatGPT Makes Medicine Easy to Swallow: An Exploratory Case Study on
Simplified Radiology Reports [0.4194454151396506]
ChatGPT is a language model capable of generating text that appears human-like and authentic.
We asked 15 radiologists to assess the quality of radiology reports simplified by ChatGPT.
Most radiologists agreed that the simplified reports were factually correct, complete, and not potentially harmful to the patient.
arXiv Detail & Related papers (2022-12-30T18:55:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.