Related papers: Semantic Consistency-Based Uncertainty Quantification for Factuality in Radiology Report Generation

Semantic Consistency-Based Uncertainty Quantification for Factuality in Radiology Report Generation

URL: http://arxiv.org/abs/2412.04606v2
Date: Sun, 16 Mar 2025 19:19:05 GMT
Title: Semantic Consistency-Based Uncertainty Quantification for Factuality in Radiology Report Generation
Authors: Chenyu Wang, Weichao Zhou, Shantanu Ghosh, Kayhan Batmanghelich, Wenchao Li,
Abstract summary: generative medical Vision Large Language Models (VLLMs) are prone to hallucinations and can produce inaccurate diagnostic information.<n>We introduce a novel Semantic Consistency-Based Uncertainty Quantification framework that provides both report-level and sentence-level uncertainties.<n>Our approach improves factuality scores by $10$%, achieved by rejecting $20$% of reports using the textttRadialog model on the MIMIC-CXR dataset.
Score: 20.173287130474797
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Radiology report generation (RRG) has shown great potential in assisting radiologists by automating the labor-intensive task of report writing. While recent advancements have improved the quality and coherence of generated reports, ensuring their factual correctness remains a critical challenge. Although generative medical Vision Large Language Models (VLLMs) have been proposed to address this issue, these models are prone to hallucinations and can produce inaccurate diagnostic information. To address these concerns, we introduce a novel Semantic Consistency-Based Uncertainty Quantification framework that provides both report-level and sentence-level uncertainties. Unlike existing approaches, our method does not require modifications to the underlying model or access to its inner state, such as output token logits, thus serving as a plug-and-play module that can be seamlessly integrated with state-of-the-art models. Extensive experiments demonstrate the efficacy of our method in detecting hallucinations and enhancing the factual accuracy of automatically generated radiology reports. By abstaining from high-uncertainty reports, our approach improves factuality scores by $10$\%, achieved by rejecting $20$\% of reports using the \texttt{Radialog} model on the MIMIC-CXR dataset. Furthermore, sentence-level uncertainty flags the lowest-precision sentence in each report with an $82.9$\% success rate. Our implementation is open-source and available at https://github.com/BU-DEPEND-Lab/SCUQ-RRG.

Related papers

MedAutoCorrect: Image-Conditioned Autocorrection in Medical Reporting [31.710972402763527]
In medical reporting, the accuracy of radiological reports, whether generated by humans or machine learning algorithms, is critical.<n>We tackle a new task in this paper: image-conditioned autocorrection of inaccuracies within these reports.<n>We propose a two-stage framework capable of pinpointing these errors and then making corrections, simulating an textitautocorrection process.
arXiv Detail & Related papers (2024-12-04T02:32:53Z)
Anatomically-Grounded Fact Checking of Automated Chest X-ray Reports [0.0]
We propose a novel model for explainable fact-checking that identifies errors in findings and their locations indicated through the reports. We evaluate the resulting fact-checking model and its utility in correcting reports generated by several SOTA automated reporting tools.
arXiv Detail & Related papers (2024-12-03T05:21:42Z)
ReXErr: Synthesizing Clinically Meaningful Errors in Diagnostic Radiology Reports [1.9106067578277455]
We introduce ReXErr, a methodology that leverages Large Language Models to generate representative errors within chest X-ray reports. We developed error categories that capture common mistakes in both human and AI-generated reports. Our approach uses a novel sampling scheme to inject diverse errors while maintaining clinical plausibility.
arXiv Detail & Related papers (2024-09-17T01:42:39Z)
Contrastive Learning with Counterfactual Explanations for Radiology Report Generation [83.30609465252441]
We propose a textbfCountertextbfFactual textbfExplanations-based framework (CoFE) for radiology report generation. Counterfactual explanations serve as a potent tool for understanding how decisions made by algorithms can be changed by asking what if'' scenarios. Experiments on two benchmarks demonstrate that leveraging the counterfactual explanations enables CoFE to generate semantically coherent and factually complete reports.
arXiv Detail & Related papers (2024-07-19T17:24:25Z)
X-ray Made Simple: Lay Radiology Report Generation and Robust Evaluation [22.09740244042415]
Radiology Report Generation (RRG) has advanced considerably with the development of multimodal generative models. RRG with high performance on existing lexical-based metrics might be more of a mirage - a model can get a high BLEU only by learning the template of reports. We propose a semantics-based evaluation method, which is effective in mitigating the inflated numbers of BLEU and provides more robust evaluation.
arXiv Detail & Related papers (2024-06-25T19:52:01Z)
ICON: Improving Inter-Report Consistency in Radiology Report Generation via Lesion-aware Mixup Augmentation [14.479606737135045]
We propose ICON, which improves the inter-report consistency of radiology report generation. Our approach first involves extracting lesions from input images and examining their characteristics. Then, we introduce a lesion-aware mixup technique to ensure that the representations of the semantically equivalent lesions align with the same attributes.
arXiv Detail & Related papers (2024-02-20T09:13:15Z)
It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition [70.77292069313154]
Large language models (LLMs) can be successfully used for generative error correction (GER) on top of the automatic speech recognition (ASR) output. In this work, we aim to overcome such a limitation by infusing acoustic information before generating the predicted transcription through a novel late fusion solution termed Uncertainty-Aware Dynamic Fusion (UADF)
arXiv Detail & Related papers (2024-02-08T07:21:45Z)
Cross-Modal Causal Intervention for Medical Report Generation [109.83549148448469]
Medical report generation (MRG) is essential for computer-aided diagnosis and medication guidance. Due to the spurious correlations within image-text data induced by visual and linguistic biases, it is challenging to generate accurate reports reliably describing lesion areas. We propose a novel Visual-Linguistic Causal Intervention (VLCI) framework for MRG, which consists of a visual deconfounding module (VDM) and a linguistic deconfounding module (LDM)
arXiv Detail & Related papers (2023-03-16T07:23:55Z)
Variational Topic Inference for Chest X-Ray Report Generation [102.04931207504173]
Report generation for medical imaging promises to reduce workload and assist diagnosis in clinical practice. Recent work has shown that deep learning models can successfully caption natural images. We propose variational topic inference for automatic report generation.
arXiv Detail & Related papers (2021-07-15T13:34:38Z)
Confidence-Guided Radiology Report Generation [24.714303916431078]
We propose a novel method to quantify both the visual uncertainty and the textual uncertainty for the task of radiology report generation. Our experimental results have demonstrated that our proposed method for model uncertainty characterization and estimation can provide more reliable confidence scores for radiology report generation.
arXiv Detail & Related papers (2021-06-21T07:02:12Z)
Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance. For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming. In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z)
UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model. UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data. We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD) UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.