Related papers: AT-CXR: Uncertainty-Aware Agentic Triage for Chest X-rays

AT-CXR: Uncertainty-Aware Agentic Triage for Chest X-rays

URL: http://arxiv.org/abs/2508.19322v1
Date: Tue, 26 Aug 2025 14:33:09 GMT
Title: AT-CXR: Uncertainty-Aware Agentic Triage for Chest X-rays
Authors: Xueyang Li, Mingze Jiang, Gelei Xu, Jun Xia, Mengzhao Jia, Danny Chen, Yiyu Shi,
Abstract summary: We introduce AT-CXR, an uncertainty-aware agent for chest X-rays.<n>The system estimates per-case confidence and distributional fit, then follows a stepwise policy to issue an automated decision.<n>We evaluate two router designs that share the same inputs and actions.
Score: 12.843444405498404
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Agentic AI is advancing rapidly, yet truly autonomous medical-imaging triage, where a system decides when to stop, escalate, or defer under real constraints, remains relatively underexplored. To address this gap, we introduce AT-CXR, an uncertainty-aware agent for chest X-rays. The system estimates per-case confidence and distributional fit, then follows a stepwise policy to issue an automated decision or abstain with a suggested label for human intervention. We evaluate two router designs that share the same inputs and actions: a deterministic rule-based router and an LLM-decided router. Across five-fold evaluation on a balanced subset of NIH ChestX-ray14 dataset, both variants outperform strong zero-shot vision-language models and state-of-the-art supervised classifiers, achieving higher full-coverage accuracy and superior selective-prediction performance, evidenced by a lower area under the risk-coverage curve (AURC) and a lower error rate at high coverage, while operating with lower latency that meets practical clinical constraints. The two routers provide complementary operating points, enabling deployments to prioritize maximal throughput or maximal accuracy. Our code is available at https://github.com/XLIAaron/uncertainty-aware-cxr-agent.

Related papers

TRACER: Trajectory Risk Aggregation for Critical Episodes in Agentic Reasoning [4.928838343487574]
Existing uncertainty proxies focus on single-shot text generation.<n>We introduce TRACER, a trajectory-level uncertainty metric for dual-control Tool-Agent-User interaction.
arXiv Detail & Related papers (2026-02-11T22:23:56Z)
Conformal Thinking: Risk Control for Reasoning on a Compute Budget [60.65072883773352]
Reasoning Large Language Models (LLMs) enable test-time scaling, with dataset-level accuracy improving as the token budget increases.<n>We re-frame the budget setting problem as risk control, limiting the error rate while minimizing compute.<n>Our framework introduces an upper threshold that stops reasoning when the model is confident and a novel lower threshold that preemptively stops unsolvable instances.
arXiv Detail & Related papers (2026-02-03T18:17:22Z)
Agentic Uncertainty Quantification [76.94013626702183]
We propose a unified Dual-Process Agentic UQ (AUQ) framework that transforms verbalized uncertainty into active, bi-directional control signals.<n>Our architecture comprises two complementary mechanisms: System 1 (Uncertainty-Aware Memory, UAM), which implicitly propagates verbalized confidence and semantic explanations to prevent blind decision-making; and System 2 (Uncertainty-Aware Reflection, UAR), which utilizes these explanations as rational cues to trigger targeted inference-time resolution only when necessary.
arXiv Detail & Related papers (2026-01-22T07:16:26Z)
BAPO: Boundary-Aware Policy Optimization for Reliable Agentic Search [72.87861928940929]
Boundary-Aware Policy Optimization (BAPO) is a novel RL framework designed to cultivate reliable boundary awareness without compromising accuracy.<n>BAPO introduces two key components: (i) a group-based boundary-aware reward that encourages an IDK response only when the reasoning reaches its limit, and (ii) an adaptive reward modulator that strategically suspends this reward during early exploration, preventing the model from exploiting IDK as a shortcut.
arXiv Detail & Related papers (2026-01-16T07:06:58Z)
LEC: Linear Expectation Constraints for False-Discovery Control in Selective Prediction and Routing Systems [95.35293543918762]
Large language models (LLMs) often generate unreliable answers, while uncertainty methods fail to fully distinguish correct from incorrect predictions.<n>We address this issue through the lens of false discovery rate (FDR) control, ensuring that among all accepted predictions, the proportion of errors does not exceed a target risk level.<n>We propose LEC, which reinterprets selective prediction as a constrained decision problem by enforcing a Linear Expectation Constraint.
arXiv Detail & Related papers (2025-12-01T11:27:09Z)
Building a Foundational Guardrail for General Agentic Systems via Synthetic Data [76.18834864749606]
LLM agents can plan multi-step tasks, intervening at the planning stage-before any action is executed-is often the safest way to prevent harm.<n>Existing guardrails mostly operate post-execution, which is difficult to scale and leaves little room for controllable supervision at the plan level.<n>We introduce AuraGen, a controllable engine that synthesizes benign trajectories, injects category-labeled risks with difficulty, and filters outputs via an automated reward model.
arXiv Detail & Related papers (2025-10-10T18:42:32Z)
Uncovering Overconfident Failures in CXR Models via Augmentation-Sensitivity Risk Scoring [1.9837702647603577]
We propose an augmentation-sensitivity risk scoring (ASRS) framework to identify error-prone chest radiograph (CXR) cases.<n>ASRS scores stratify samples into stability quartiles, where highly sensitive cases show substantially lower recall.<n>ASRS provides a label-free means for selective prediction and clinician review, improving fairness and safety in medical AI.
arXiv Detail & Related papers (2025-10-02T05:15:40Z)
PASS: Probabilistic Agentic Supernet Sampling for Interpretable and Adaptive Chest X-Ray Reasoning [31.42306351491176]
PASS (Probabilistic Agentic Supernet Sampling) is the first multimodal framework to address these challenges in the context of Chest X-Ray (CXR) reasoning.<n> PASS adaptively samples agentic over a multi-tool graph, yielding decision paths annotated with interpretable probabilities.
arXiv Detail & Related papers (2025-08-14T10:03:47Z)
A Framework for Uncertainty Quantification Based on Nearest Neighbors Across Layers [0.24578723416255746]
Neural Networks have high accuracy in solving problems where it is difficult to detect patterns or create a logical model.<n>One strategy to detect and mitigate these errors is the measurement of the uncertainty over neural network decisions.<n>We present a novel post-hoc framework for measuring the uncertainty of a decision based on retrieved training cases.
arXiv Detail & Related papers (2025-06-24T11:10:41Z)
U2AD: Uncertainty-based Unsupervised Anomaly Detection Framework for Detecting T2 Hyperintensity in MRI Spinal Cord [7.811634659561162]
T2 hyperintensities in spinal cord MR images are crucial biomarkers for conditions such as degenerative cervical myelopathy.<n>Deep learning methods have shown promise in lesion detection, but most supervised approaches are heavily dependent on large, annotated datasets.<n>We propose an Uncertainty-based Unsupervised Anomaly Detection framework, termed U2AD, to address these limitations.
arXiv Detail & Related papers (2025-03-17T17:33:32Z)
Quality assurance of organs-at-risk delineation in radiotherapy [7.698565355235687]
The delineation of tumor target and organs-at-risk is critical in the radiotherapy treatment planning. The quality assurance of the automatic segmentation is still an unmet need in clinical practice. Our proposed model, which introduces residual network and attention mechanism in the one-class classification framework, was able to detect the various types of OAR contour errors with high accuracy.
arXiv Detail & Related papers (2024-05-20T02:32:46Z)
Industrial Anomaly Detection and Localization Using Weakly-Supervised Residual Transformers [44.344548601242444]
We introduce a novel framework, Weakly-supervised RESidual Transformer (WeakREST), to achieve high anomaly detection accuracy.<n>We reformulate the pixel-wise anomaly localization task into a block-wise classification problem.<n>We develop a novel ResMixMatch algorithm, capable of handling the interplay between weak labels and residual-based representations.
arXiv Detail & Related papers (2023-06-06T08:19:30Z)
Robust-by-Design Classification via Unitary-Gradient Neural Networks [66.17379946402859]
The use of neural networks in safety-critical systems requires safe and robust models, due to the existence of adversarial attacks. Knowing the minimal adversarial perturbation of any input x, or, equivalently, the distance of x from the classification boundary, allows evaluating the classification robustness, providing certifiable predictions. A novel network architecture named Unitary-Gradient Neural Network is presented. Experimental results show that the proposed architecture approximates a signed distance, hence allowing an online certifiable classification of x at the cost of a single inference.
arXiv Detail & Related papers (2022-09-09T13:34:51Z)
Ultra-Reliable Indoor Millimeter Wave Communications using Multiple Artificial Intelligence-Powered Intelligent Surfaces [115.85072043481414]
We propose a novel framework for guaranteeing ultra-reliable millimeter wave (mmW) communications using multiple artificial intelligence (AI)-enabled reconfigurable intelligent surfaces (RISs) The use of multiple AI-powered RISs allows changing the propagation direction of the signals transmitted from a mmW access point (AP) Two centralized and distributed controllers are proposed to control the policies of the mmW AP and RISs.
arXiv Detail & Related papers (2021-03-31T19:15:49Z)
Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance. For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming. In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z)
Optimally Displaced Threshold Detection for Discriminating Binary Coherent States Using Imperfect Devices [50.09039506170243]
We analytically study the performance of the generalized Kennedy receiver having optimally displaced threshold detection (ODTD) in a realistic situation with noises and imperfect devices. We show that the proposed greedy search algorithm can obtain a lower and smoother error probability than the existing works.
arXiv Detail & Related papers (2020-07-21T21:52:29Z)
SUOD: Accelerating Large-Scale Unsupervised Heterogeneous Outlier Detection [63.253850875265115]
Outlier detection (OD) is a key machine learning (ML) task for identifying abnormal objects from general samples. We propose a modular acceleration system, called SUOD, to address it.
arXiv Detail & Related papers (2020-03-11T00:22:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.