Evaluating Large Language Models for Radiology Natural Language
Processing
- URL: http://arxiv.org/abs/2307.13693v2
- Date: Thu, 27 Jul 2023 12:58:59 GMT
- Title: Evaluating Large Language Models for Radiology Natural Language
Processing
- Authors: Zhengliang Liu, Tianyang Zhong, Yiwei Li, Yutong Zhang, Yi Pan, Zihao
Zhao, Peixin Dong, Chao Cao, Yuxiao Liu, Peng Shu, Yaonai Wei, Zihao Wu,
Chong Ma, Jiaqi Wang, Sheng Wang, Mengyue Zhou, Zuowei Jiang, Chunlin Li,
Jason Holmes, Shaochen Xu, Lu Zhang, Haixing Dai, Kai Zhang, Lin Zhao,
Yuanhao Chen, Xu Liu, Peilong Wang, Pingkun Yan, Jun Liu, Bao Ge, Lichao Sun,
Dajiang Zhu, Xiang Li, Wei Liu, Xiaoyan Cai, Xintao Hu, Xi Jiang, Shu Zhang,
Xin Zhang, Tuo Zhang, Shijie Zhao, Quanzheng Li, Hongtu Zhu, Dinggang Shen,
Tianming Liu
- Abstract summary: The rise of large language models (LLMs) has marked a pivotal shift in the field of natural language processing (NLP)
This study seeks to bridge this gap by critically evaluating thirty two LLMs in interpreting radiology reports.
- Score: 68.98847776913381
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The rise of large language models (LLMs) has marked a pivotal shift in the
field of natural language processing (NLP). LLMs have revolutionized a
multitude of domains, and they have made a significant impact in the medical
field. Large language models are now more abundant than ever, and many of these
models exhibit bilingual capabilities, proficient in both English and Chinese.
However, a comprehensive evaluation of these models remains to be conducted.
This lack of assessment is especially apparent within the context of radiology
NLP. This study seeks to bridge this gap by critically evaluating thirty two
LLMs in interpreting radiology reports, a crucial component of radiology NLP.
Specifically, the ability to derive impressions from radiologic findings is
assessed. The outcomes of this evaluation provide key insights into the
performance, strengths, and weaknesses of these LLMs, informing their practical
applications within the medical domain.
Related papers
- The current status of large language models in summarizing radiology report impressions [13.402769727597812]
The effectiveness of large language models (LLMs) in summarizing radiology report impressions remains unclear.
Three types of radiology reports, i.e., CT, PET-CT, and Ultrasound reports, are collected from Peking University Cancer Hospital and Institute.
We use the report findings to construct the zero-shot, one-shot, and three-shot prompts with complete example reports to generate the impressions.
arXiv Detail & Related papers (2024-06-04T09:23:30Z) - Lessons from the Trenches on Reproducible Evaluation of Language Models [60.522749986793094]
We draw on three years of experience in evaluating large language models to provide guidance and lessons for researchers.
We present the Language Model Evaluation Harness (lm-eval), an open source library for independent, reproducible, and evaluation of language models.
arXiv Detail & Related papers (2024-05-23T16:50:49Z) - D-NLP at SemEval-2024 Task 2: Evaluating Clinical Inference Capabilities of Large Language Models [5.439020425819001]
Large language models (LLMs) have garnered significant attention and widespread usage due to their impressive performance in various tasks.
However, they are not without their own set of challenges, including issues such as hallucinations, factual inconsistencies, and limitations in numerical-quantitative reasoning.
arXiv Detail & Related papers (2024-05-07T10:11:14Z) - Can Large Language Models abstract Medical Coded Language? [0.0]
Large language models (LLMs) are aware of medical code and can accurately generate names from these codes.
This study evaluates whether large language models (LLMs) are aware of medical code and can accurately generate names from these codes.
arXiv Detail & Related papers (2024-03-16T06:18:15Z) - Unveiling Linguistic Regions in Large Language Models [49.298360366468934]
Large Language Models (LLMs) have demonstrated considerable cross-lingual alignment and generalization ability.
This paper conducts several investigations on the linguistic competence of LLMs.
arXiv Detail & Related papers (2024-02-22T16:56:13Z) - DAEDRA: A language model for predicting outcomes in passive
pharmacovigilance reporting [0.0]
DAEDRA is a large language model designed to detect regulatory-relevant outcomes in adverse event reports.
This paper details the conception, design, training and evaluation of DAEDRA.
arXiv Detail & Related papers (2024-02-10T16:48:45Z) - Quantifying the Dialect Gap and its Correlates Across Languages [69.18461982439031]
This work will lay the foundation for furthering the field of dialectal NLP by laying out evident disparities and identifying possible pathways for addressing them through mindful data collection.
arXiv Detail & Related papers (2023-10-23T17:42:01Z) - Multilingual Natural Language Processing Model for Radiology Reports --
The Summary is all you need! [2.4910932804601855]
The generation of radiology impressions was automated by fine-tuning a model based on a multilingual text-to-text Transformer.
In a blind test, two board-certified radiologists indicated that for at least 70% of the system-generated summaries, the quality matched or exceeded the corresponding human-written summaries.
This study showed that the multilingual model outperformed other models that specialized in summarizing radiology reports in only one language, as well as models that were not specifically designed for summarizing radiology reports.
arXiv Detail & Related papers (2023-09-29T19:20:27Z) - Radiology-Llama2: Best-in-Class Large Language Model for Radiology [71.27700230067168]
This paper introduces Radiology-Llama2, a large language model specialized for radiology through a process known as instruction tuning.
Quantitative evaluations using ROUGE metrics on the MIMIC-CXR and OpenI datasets demonstrate that Radiology-Llama2 achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-08-29T17:44:28Z) - Radiology-GPT: A Large Language Model for Radiology [74.07944784968372]
We introduce Radiology-GPT, a large language model for radiology.
It demonstrates superior performance compared to general language models such as StableLM, Dolly and LLaMA.
It exhibits significant versatility in radiological diagnosis, research, and communication.
arXiv Detail & Related papers (2023-06-14T17:57:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.