Beyond the First Read: AI-Assisted Perceptual Error Detection in Chest Radiography Accounting for Interobserver Variability
- URL: http://arxiv.org/abs/2506.13049v1
- Date: Mon, 16 Jun 2025 02:36:38 GMT
- Title: Beyond the First Read: AI-Assisted Perceptual Error Detection in Chest Radiography Accounting for Interobserver Variability
- Authors: Adhrith Vutukuri, Akash Awasthi, David Yang, Carol C. Wu, Hien Van Nguyen,
- Abstract summary: We introduce RADAR, a post-interpretation companion system.<n>RADAR ingests radiologist annotations and CXR images, then performs regional-level analysis.<n>RADAR achieved a recall of 0.78, precision of 0.44, and an F1 score of 0.56 in detecting missed abnormalities.
- Score: 3.32947201358052
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Chest radiography is widely used in diagnostic imaging. However, perceptual errors -- especially overlooked but visible abnormalities -- remain common and clinically significant. Current workflows and AI systems provide limited support for detecting such errors after interpretation and often lack meaningful human--AI collaboration. We introduce RADAR (Radiologist--AI Diagnostic Assistance and Review), a post-interpretation companion system. RADAR ingests finalized radiologist annotations and CXR images, then performs regional-level analysis to detect and refer potentially missed abnormal regions. The system supports a "second-look" workflow and offers suggested regions of interest (ROIs) rather than fixed labels to accommodate inter-observer variation. We evaluated RADAR on a simulated perceptual-error dataset derived from de-identified CXR cases, using F1 score and Intersection over Union (IoU) as primary metrics. RADAR achieved a recall of 0.78, precision of 0.44, and an F1 score of 0.56 in detecting missed abnormalities in the simulated perceptual-error dataset. Although precision is moderate, this reduces over-reliance on AI by encouraging radiologist oversight in human--AI collaboration. The median IoU was 0.78, with more than 90% of referrals exceeding 0.5 IoU, indicating accurate regional localization. RADAR effectively complements radiologist judgment, providing valuable post-read support for perceptual-error detection in CXR interpretation. Its flexible ROI suggestions and non-intrusive integration position it as a promising tool in real-world radiology workflows. To facilitate reproducibility and further evaluation, we release a fully open-source web implementation alongside a simulated error dataset. All code, data, demonstration videos, and the application are publicly available at https://github.com/avutukuri01/RADAR.
Related papers
- XAI-Driven Diagnosis of Generalization Failure in State-Space Cerebrovascular Segmentation Models: A Case Study on Domain Shift Between RSNA and TopCoW Datasets [0.5735035463793009]
We present a rigorous, two-phase approach to diagnose the generalization failure of state-of-the-art State-Space Models (SSMs)<n>We quantified the model's focus by measuring the overlap between its attention maps and the Ground Truth segmentations.<n>Our analysis proves the model failed to generalize because its attention mechanism abandoned true anatomical features in the Target domain.
arXiv Detail & Related papers (2025-12-16T00:34:32Z) - ActiTect: A Generalizable Machine Learning Pipeline for REM Sleep Behavior Disorder Screening through Standardized Actigraphy [1.2106870940376342]
Isolated rapid eye movement sleep behavior disorder (iRBD) is a major prodromal marker of $$-synucleinopathies.<n> wrist-worn actimeters hold significant potential for detecting RBD in large-scale screening efforts.<n>This study presents ActiTect, a fully automated, open-source machine learning tool to identify RBD from actigraphy recordings.
arXiv Detail & Related papers (2025-11-07T13:18:20Z) - U2AD: Uncertainty-based Unsupervised Anomaly Detection Framework for Detecting T2 Hyperintensity in MRI Spinal Cord [7.811634659561162]
T2 hyperintensities in spinal cord MR images are crucial biomarkers for conditions such as degenerative cervical myelopathy.<n>Deep learning methods have shown promise in lesion detection, but most supervised approaches are heavily dependent on large, annotated datasets.<n>We propose an Uncertainty-based Unsupervised Anomaly Detection framework, termed U2AD, to address these limitations.
arXiv Detail & Related papers (2025-03-17T17:33:32Z) - DDxT: Deep Generative Transformer Models for Differential Diagnosis [51.25660111437394]
We show that a generative approach trained with simpler supervised and self-supervised learning signals can achieve superior results on the current benchmark.
The proposed Transformer-based generative network, named DDxT, autoregressively produces a set of possible pathologies, i.e., DDx, and predicts the actual pathology using a neural network.
arXiv Detail & Related papers (2023-12-02T22:57:25Z) - StenUNet: Automatic Stenosis Detection from X-ray Coronary Angiography [5.430434855741553]
The severity of coronary artery disease (CAD) is quantified by the location, degree of narrowing (stenosis) and number of arteries involved.
The MICCAI grand challenge: Automatic Region-based Coronary Artery Disease diagnostics using the X-ray angiography imagEs (ARCADE) curated a dataset with stenosis annotations.
We propose the architecture and algorithm StenUNet to accurately detect stenosis from X-ray Coronary Angiography.
arXiv Detail & Related papers (2023-10-23T14:04:18Z) - I-AI: A Controllable & Interpretable AI System for Decoding
Radiologists' Intense Focus for Accurate CXR Diagnoses [9.260958560874812]
Interpretable Artificial Intelligence (I-AI) is a novel and unified controllable interpretable pipeline.
Our I-AI addresses three key questions: where a radiologist looks, how long they focus on specific areas, and what findings they diagnose.
arXiv Detail & Related papers (2023-09-24T04:48:44Z) - Cross-Modal Causal Intervention for Medical Report Generation [107.76649943399168]
Radiology Report Generation (RRG) is essential for computer-aided diagnosis and medication guidance.<n> generating accurate lesion descriptions remains challenging due to spurious correlations from visual-linguistic biases.<n>We propose a two-stage framework named CrossModal Causal Representation Learning (CMCRL)<n> Experiments on IU-Xray and MIMIC-CXR show that our CMCRL pipeline significantly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-03-16T07:23:55Z) - Learning to diagnose cirrhosis from radiological and histological labels
with joint self and weakly-supervised pretraining strategies [62.840338941861134]
We propose to leverage transfer learning from large datasets annotated by radiologists, to predict the histological score available on a small annex dataset.
We compare different pretraining methods, namely weakly-supervised and self-supervised ones, to improve the prediction of the cirrhosis.
This method outperforms the baseline classification of the METAVIR score, reaching an AUC of 0.84 and a balanced accuracy of 0.75.
arXiv Detail & Related papers (2023-02-16T17:06:23Z) - Are we certain it's anomalous? [57.729669157989235]
Anomaly detection in time series is a complex task since anomalies are rare due to highly non-linear temporal correlations.
Here we propose the novel use of Hyperbolic uncertainty for Anomaly Detection (HypAD)
HypAD learns self-supervisedly to reconstruct the input signal.
arXiv Detail & Related papers (2022-11-16T21:31:39Z) - Radiomics-Guided Global-Local Transformer for Weakly Supervised
Pathology Localization in Chest X-Rays [65.88435151891369]
Radiomics-Guided Transformer (RGT) fuses textitglobal image information with textitlocal knowledge-guided radiomics information.
RGT consists of an image Transformer branch, a radiomics Transformer branch, and fusion layers that aggregate image and radiomic information.
arXiv Detail & Related papers (2022-07-10T06:32:56Z) - StRegA: Unsupervised Anomaly Detection in Brain MRIs using a Compact
Context-encoding Variational Autoencoder [48.2010192865749]
Unsupervised anomaly detection (UAD) can learn a data distribution from an unlabelled dataset of healthy subjects and then be applied to detect out of distribution samples.
This research proposes a compact version of the "context-encoding" VAE (ceVAE) model, combined with pre and post-processing steps, creating a UAD pipeline (StRegA)
The proposed pipeline achieved a Dice score of 0.642$pm$0.101 while detecting tumours in T2w images of the BraTS dataset and 0.859$pm$0.112 while detecting artificially induced anomalies.
arXiv Detail & Related papers (2022-01-31T14:27:35Z) - Generative Residual Attention Network for Disease Detection [51.60842580044539]
We present a novel approach for disease generation in X-rays using a conditional generative adversarial learning.
We generate a corresponding radiology image in a target domain while preserving the identity of the patient.
We then use the generated X-ray image in the target domain to augment our training to improve the detection performance.
arXiv Detail & Related papers (2021-10-25T14:15:57Z) - A clinical validation of VinDr-CXR, an AI system for detecting abnormal
chest radiographs [0.0]
We demonstrate a mechanism for validating an AI-based system for detecting abnormalities on X-ray scans.
The system achieves an F1 score - the harmonic average of the recall and the precision - of 0.653 CI 0.635, 0.671) for detecting any abnormalities on chest X-rays.
arXiv Detail & Related papers (2021-04-06T02:53:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.