Consensus, dissensus and synergy between clinicians and specialist
foundation models in radiology report generation
- URL: http://arxiv.org/abs/2311.18260v3
- Date: Wed, 20 Dec 2023 23:08:32 GMT
- Title: Consensus, dissensus and synergy between clinicians and specialist
foundation models in radiology report generation
- Authors: Ryutaro Tanno, David G.T. Barrett, Andrew Sellergren, Sumedh Ghaisas,
Sumanth Dathathri, Abigail See, Johannes Welbl, Karan Singhal, Shekoofeh
Azizi, Tao Tu, Mike Schaekermann, Rhys May, Roy Lee, SiWai Man, Zahra Ahmed,
Sara Mahdavi, Yossi Matias, Joelle Barral, Ali Eslami, Danielle Belgrave,
Vivek Natarajan, Shravya Shetty, Pushmeet Kohli, Po-Sen Huang, Alan
Karthikesalingam, Ira Ktena
- Abstract summary: The worldwide shortage of radiologists restricts access to expert care and imposes heavy workloads.
Recent progress in automated report generation with vision-language models offer clear potential in ameliorating the situation.
We build a state-of-the-art report generation system for chest radiographs, $textitFlamingo-CXR, by fine-tuning a well-known vision-language foundation model on radiology data.
- Score: 32.26270073540666
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Radiology reports are an instrumental part of modern medicine, informing key
clinical decisions such as diagnosis and treatment. The worldwide shortage of
radiologists, however, restricts access to expert care and imposes heavy
workloads, contributing to avoidable errors and delays in report delivery.
While recent progress in automated report generation with vision-language
models offer clear potential in ameliorating the situation, the path to
real-world adoption has been stymied by the challenge of evaluating the
clinical quality of AI-generated reports. In this study, we build a
state-of-the-art report generation system for chest radiographs,
$\textit{Flamingo-CXR}$, by fine-tuning a well-known vision-language foundation
model on radiology data. To evaluate the quality of the AI-generated reports, a
group of 16 certified radiologists provide detailed evaluations of AI-generated
and human written reports for chest X-rays from an intensive care setting in
the United States and an inpatient setting in India. At least one radiologist
(out of two per case) preferred the AI report to the ground truth report in
over 60$\%$ of cases for both datasets. Amongst the subset of AI-generated
reports that contain errors, the most frequently cited reasons were related to
the location and finding, whereas for human written reports, most mistakes were
related to severity and finding. This disparity suggested potential
complementarity between our AI system and human experts, prompting us to
develop an assistive scenario in which Flamingo-CXR generates a first-draft
report, which is subsequently revised by a clinician. This is the first
demonstration of clinician-AI collaboration for report writing, and the
resultant reports are assessed to be equivalent or preferred by at least one
radiologist to reports written by experts alone in 80$\%$ of in-patient cases
and 60$\%$ of intensive care cases.
Related papers
- ReXErr: Synthesizing Clinically Meaningful Errors in Diagnostic Radiology Reports [1.9106067578277455]
We introduce ReXErr, a methodology that leverages Large Language Models to generate representative errors within chest X-ray reports.
We developed error categories that capture common mistakes in both human and AI-generated reports.
Our approach uses a novel sampling scheme to inject diverse errors while maintaining clinical plausibility.
arXiv Detail & Related papers (2024-09-17T01:42:39Z) - RaTEScore: A Metric for Radiology Report Generation [59.37561810438641]
This paper introduces a novel, entity-aware metric, as Radiological Report (Text) Evaluation (RaTEScore)
RaTEScore emphasizes crucial medical entities such as diagnostic outcomes and anatomical details, and is robust against complex medical synonyms and sensitive to negation expressions.
Our evaluations demonstrate that RaTEScore aligns more closely with human preference than existing metrics, validated both on established public benchmarks and our newly proposed RaTE-Eval benchmark.
arXiv Detail & Related papers (2024-06-24T17:49:28Z) - Large Model driven Radiology Report Generation with Clinical Quality
Reinforcement Learning [16.849933628738277]
Radiology report generation (RRG) has attracted significant attention due to its potential to reduce the workload of radiologists.
This paper introduces a novel RRG method, textbfLM-RRG, that integrates large models (LMs) with clinical quality reinforcement learning.
Experiments on the MIMIC-CXR and IU-Xray datasets demonstrate the superiority of our method over the state of the art.
arXiv Detail & Related papers (2024-03-11T13:47:11Z) - Estimating the severity of dental and oral problems via sentiment
classification over clinical reports [0.8287206589886879]
Analyzing authors' sentiments in texts can be practical and useful in various fields, including medicine and dentistry.
Deep learning model based on Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) network architecture, known as CNN-LSTM, was developed to detect severity level of patient's problem.
arXiv Detail & Related papers (2024-01-17T14:33:13Z) - Radiology Report Generation Using Transformers Conditioned with
Non-imaging Data [55.17268696112258]
This paper proposes a novel multi-modal transformer network that integrates chest x-ray (CXR) images and associated patient demographic information.
The proposed network uses a convolutional neural network to extract visual features from CXRs and a transformer-based encoder-decoder network that combines the visual features with semantic text embeddings of patient demographic information.
arXiv Detail & Related papers (2023-11-18T14:52:26Z) - ChatRadio-Valuer: A Chat Large Language Model for Generalizable
Radiology Report Generation Based on Multi-institution and Multi-system Data [115.0747462486285]
ChatRadio-Valuer is a tailored model for automatic radiology report generation that learns generalizable representations.
The clinical dataset utilized in this study encompasses a remarkable total of textbf332,673 observations.
ChatRadio-Valuer consistently outperforms state-of-the-art models, especially ChatGPT (GPT-3.5-Turbo) and GPT-4 et al.
arXiv Detail & Related papers (2023-10-08T17:23:17Z) - Multilingual Natural Language Processing Model for Radiology Reports --
The Summary is all you need! [2.4910932804601855]
The generation of radiology impressions was automated by fine-tuning a model based on a multilingual text-to-text Transformer.
In a blind test, two board-certified radiologists indicated that for at least 70% of the system-generated summaries, the quality matched or exceeded the corresponding human-written summaries.
This study showed that the multilingual model outperformed other models that specialized in summarizing radiology reports in only one language, as well as models that were not specifically designed for summarizing radiology reports.
arXiv Detail & Related papers (2023-09-29T19:20:27Z) - Act Like a Radiologist: Radiology Report Generation across Anatomical Regions [50.13206214694885]
X-RGen is a radiologist-minded report generation framework across six anatomical regions.
In X-RGen, we seek to mimic the behaviour of human radiologists, breaking them down into four principal phases.
We enhance the recognition capacity of the image encoder by analysing images and reports across various regions.
arXiv Detail & Related papers (2023-05-26T07:12:35Z) - Exploring and Distilling Posterior and Prior Knowledge for Radiology
Report Generation [55.00308939833555]
The PPKED includes three modules: Posterior Knowledge Explorer (PoKE), Prior Knowledge Explorer (PrKE) and Multi-domain Knowledge Distiller (MKD)
PoKE explores the posterior knowledge, which provides explicit abnormal visual regions to alleviate visual data bias.
PrKE explores the prior knowledge from the prior medical knowledge graph (medical knowledge) and prior radiology reports (working experience) to alleviate textual data bias.
arXiv Detail & Related papers (2021-06-13T11:10:02Z) - Auxiliary Signal-Guided Knowledge Encoder-Decoder for Medical Report
Generation [107.3538598876467]
We propose an Auxiliary Signal-Guided Knowledge-Decoder (ASGK) to mimic radiologists' working patterns.
ASGK integrates internal visual feature fusion and external medical linguistic information to guide medical knowledge transfer and learning.
arXiv Detail & Related papers (2020-06-06T01:00:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.