Related papers: Multimodal Human-AI Synergy for Medical Imaging Quality Control: A Hybrid Intelligence Framework with Adaptive Dataset Curation and Closed-Loop Evaluation

Multimodal Human-AI Synergy for Medical Imaging Quality Control: A Hybrid Intelligence Framework with Adaptive Dataset Curation and Closed-Loop Evaluation

URL: http://arxiv.org/abs/2503.07032v1
Date: Mon, 10 Mar 2025 08:16:18 GMT
Title: Multimodal Human-AI Synergy for Medical Imaging Quality Control: A Hybrid Intelligence Framework with Adaptive Dataset Curation and Closed-Loop Evaluation
Authors: Zhi Qin, Qianhui Gui, Mouxiao Bian, Rui Wang, Hong Ge, Dandan Yao, Ziying Sun, Yuan Zhao, Yu Zhang, Hui Shi, Dongdong Wang, Chenxin Song, Shenghong Ju, Lihao Liu, Junjun He, Jie Xu, Yuan-Cheng Wang,
Abstract summary: We evaluate large language models (LLMs) in image quality assessment and report standardization.<n> Gemini 2.0-Flash achieved a Macro F1 score of 90 in CXR tasks, demonstrating strong generalization but limited fine-grained performance.<n>DeepSeek-R1 excelled in CT report auditing with a 62.23% recall rate, outperforming other models.
Score: 16.19033330311087
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Medical imaging quality control (QC) is essential for accurate diagnosis, yet traditional QC methods remain labor-intensive and subjective. To address this challenge, in this study, we establish a standardized dataset and evaluation framework for medical imaging QC, systematically assessing large language models (LLMs) in image quality assessment and report standardization. Specifically, we first constructed and anonymized a dataset of 161 chest X-ray (CXR) radiographs and 219 CT reports for evaluation. Then, multiple LLMs, including Gemini 2.0-Flash, GPT-4o, and DeepSeek-R1, were evaluated based on recall, precision, and F1 score to detect technical errors and inconsistencies. Experimental results show that Gemini 2.0-Flash achieved a Macro F1 score of 90 in CXR tasks, demonstrating strong generalization but limited fine-grained performance. DeepSeek-R1 excelled in CT report auditing with a 62.23\% recall rate, outperforming other models. However, its distilled variants performed poorly, while InternLM2.5-7B-chat exhibited the highest additional discovery rate, indicating broader but less precise error detection. These findings highlight the potential of LLMs in medical imaging QC, with DeepSeek-R1 and Gemini 2.0-Flash demonstrating superior performance.

Related papers

AGHI-QA: A Subjective-Aligned Dataset and Metric for AI-Generated Human Images [58.87047247313503]
We introduce AGHI-QA, the first large-scale benchmark specifically designed for quality assessment of human images (AGHIs) The dataset comprises 4,000 images generated from 400 carefully crafted text prompts using 10 state-of-the-art T2I models. We conduct a systematic subjective study to collect multidimensional annotations, including perceptual quality scores, text-image correspondence scores, visible and distorted body part labels.
arXiv Detail & Related papers (2025-04-30T04:36:56Z)
Rethinking Medical Anomaly Detection in Brain MRI: An Image Quality Assessment Perspective [14.39502951611029]
We propose a fusion quality loss function that combines Structural Similarity Index Measure loss with l1 loss, offering a more comprehensive evaluation of reconstruction quality. We also introduce a data pre-processing strategy that enhances the average intensity ratio (AIR) between normal and abnormal regions, further improving the distinction of anomalies. The proposed IQA approach achieves significant improvements (>10%) in terms of Dice coefficient (DICE) and Area Under the Precision-Recall Curve (AUPRC) on the BraTS21 (T2, FLAIR) and MSULB datasets.
arXiv Detail & Related papers (2024-08-15T15:55:07Z)
Enhancing Diagnostic Reliability of Foundation Model with Uncertainty Estimation in OCT Images [41.002573031087856]
We developed a foundation model with uncertainty estimation (FMUE) to detect 11 retinal conditions on optical coherence tomography ( OCT) FMUE achieved a higher F1 score of 96.76% than two state-of-the-art algorithms, RETFound and UIOS, and got further improvement with thresholding strategy to 98.44%. Our model is superior to two ophthalmologists with a higher F1 score (95.17% vs. 61.93% &71.72%)
arXiv Detail & Related papers (2024-06-18T03:04:52Z)
Beyond Images: An Integrative Multi-modal Approach to Chest X-Ray Report Generation [47.250147322130545]
Image-to-text radiology report generation aims to automatically produce radiology reports that describe the findings in medical images. Most existing methods focus solely on the image data, disregarding the other patient information accessible to radiologists. We present a novel multi-modal deep neural network framework for generating chest X-rays reports by integrating structured patient data, such as vital signs and symptoms, alongside unstructured clinical notes.
arXiv Detail & Related papers (2023-11-18T14:37:53Z)
CXR-LLAVA: a multimodal large language model for interpreting chest X-ray images [3.0757789554622597]
This study aimed to develop an open-source multimodal large language model (CXR-LLAVA) for interpreting chest X-ray images (CXRs) For training, we collected 592,580 publicly available CXRs, of which 374,881 had labels for certain radiographic abnormalities. The model's diagnostic performance for major pathological findings was evaluated, along with the acceptability of radiologic reports by human radiologists.
arXiv Detail & Related papers (2023-10-22T06:22:37Z)
Reconstruction of Patient-Specific Confounders in AI-based Radiologic Image Interpretation using Generative Pretraining [12.656718786788758]
We propose a self-conditioned diffusion model termed DiffChest and train it on a dataset of chest radiographs. DiffChest explains classifications on a patient-specific level and visualizes the confounding factors that may mislead the model. Our findings highlight the potential of pretraining based on diffusion models in medical image classification.
arXiv Detail & Related papers (2023-09-29T10:38:08Z)
Revisiting Computer-Aided Tuberculosis Diagnosis [56.80999479735375]
Tuberculosis (TB) is a major global health threat, causing millions of deaths annually. Computer-aided tuberculosis diagnosis (CTD) using deep learning has shown promise, but progress is hindered by limited training data. We establish a large-scale dataset, namely the Tuberculosis X-ray (TBX11K) dataset, which contains 11,200 chest X-ray (CXR) images with corresponding bounding box annotations for TB areas. This dataset enables the training of sophisticated detectors for high-quality CTD.
arXiv Detail & Related papers (2023-07-06T08:27:48Z)
Domain Transfer Through Image-to-Image Translation for Uncertainty-Aware Prostate Cancer Classification [42.75911994044675]
We present a novel approach for unpaired image-to-image translation of prostate MRIs and an uncertainty-aware training approach for classifying clinically significant PCa. Our approach involves a novel pipeline for translating unpaired 3.0T multi-parametric prostate MRIs to 1.5T, thereby augmenting the available training data. Our experiments demonstrate that the proposed method significantly improves the Area Under ROC Curve (AUC) by over 20% compared to the previous work.
arXiv Detail & Related papers (2023-07-02T05:26:54Z)
Significantly improving zero-shot X-ray pathology classification via fine-tuning pre-trained image-text encoders [50.689585476660554]
We propose a new fine-tuning strategy that includes positive-pair loss relaxation and random sentence sampling. Our approach consistently improves overall zero-shot pathology classification across four chest X-ray datasets and three pre-trained models.
arXiv Detail & Related papers (2022-12-14T06:04:18Z)
Automated SSIM Regression for Detection and Quantification of Motion Artefacts in Brain MR Images [54.739076152240024]
Motion artefacts in magnetic resonance brain images are a crucial issue. The assessment of MR image quality is fundamental before proceeding with the clinical diagnosis. An automated image quality assessment based on the structural similarity index (SSIM) regression has been proposed here.
arXiv Detail & Related papers (2022-06-14T10:16:54Z)
Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance. For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming. In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z)
RCoNet: Deformable Mutual Information Maximization and High-order Uncertainty-aware Learning for Robust COVID-19 Detection [12.790651338952005]
The novel 2019 Coronavirus (COVID-19) infection has spread world widely and is currently a major healthcare challenge around the world. Due to faster imaging time and considerably lower cost than CT, detecting COVID-19 in chest X-ray (CXR) images is preferred for efficient diagnosis, assessment and treatment. We propose a novel deep network named em RCoNet$k_s$ for robust COVID-19 detection which employs em Deformable Mutual Information Maximization (DeIM), em Mixed High-order Moment Feature (MHMF) and em Multi-
arXiv Detail & Related papers (2021-02-22T15:13:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.