How Well Do Multimodal Models Reason on ECG Signals?
- URL: http://arxiv.org/abs/2603.00312v1
- Date: Fri, 27 Feb 2026 21:04:12 GMT
- Title: How Well Do Multimodal Models Reason on ECG Signals?
- Authors: Maxwell A. Xu, Harish Haresumadram, Catherine W. Liu, Patrick Langer, Jathurshan Pradeepkumar, Wanting Mao, Sunita J. Ferns, Aradhana Verma, Jimeng Sun, Paul Schmiedmayer, Xin Liu, Daniel McDuff, Emily B. Fox, James M. Rehg,
- Abstract summary: We introduce a reproducible framework for evaluating reasoning in ECG signals.<n>We employ an agentic framework that generates code to empirically verify the temporal structures described in the reasoning trace.<n>This dual-verification method enables the scalable assessment of "true" reasoning capabilities.
- Score: 36.281141199783825
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While multimodal large language models offer a promising solution to the "black box" nature of health AI by generating interpretable reasoning traces, verifying the validity of these traces remains a critical challenge. Existing evaluation methods are either unscalable, relying on manual clinician review, or superficial, utilizing proxy metrics (e.g. QA) that fail to capture the semantic correctness of clinical logic. In this work, we introduce a reproducible framework for evaluating reasoning in ECG signals. We propose decomposing reasoning into two distinct, components: (i) Perception, the accurate identification of patterns within the raw signal, and (ii) Deduction, the logical application of domain knowledge to those patterns. To evaluate Perception, we employ an agentic framework that generates code to empirically verify the temporal structures described in the reasoning trace. To evaluate Deduction, we measure the alignment of the model's logic against a structured database of established clinical criteria in a retrieval-based approach. This dual-verification method enables the scalable assessment of "true" reasoning capabilities.
Related papers
- NeuroSymb-MRG: Differentiable Abductive Reasoning with Active Uncertainty Minimization for Radiology Report Generation [17.916502111955456]
We present NeuroSymb-MRG, a unified framework that integrates NeuroSymbolic abductive reasoning with active uncertainty minimization to produce structured, clinically grounded reports.<n>The system maps image features to probabilistic clinical concepts, composes differentiable logic-based reasoning chains, decodes those chains into templated clauses, and refines the textual output via retrieval and constrained language-model editing.
arXiv Detail & Related papers (2026-03-02T11:31:30Z) - Diagnosing Pathological Chain-of-Thought in Reasoning Models [2.8521161475937675]
Chain-of-thought (CoT) reasoning is fundamental to modern LLM architectures.<n>We identify three distinct pathologies: post-hoc rationalization, encoded reasoning, and internalized reasoning.<n>Our work provides a practical toolkit for assessing CoT pathologies, with direct implications for training-time monitoring.
arXiv Detail & Related papers (2026-02-14T21:53:47Z) - Pushing the Boundaries of Natural Reasoning: Interleaved Bonus from Formal-Logic Verification [49.506412445511934]
Large Language Models (LLMs) show remarkable capabilities, yet their next-token prediction creates logical inconsistencies and reward hacking.<n>We introduce a formal logic verification-guided framework that dynamically interleaves formal symbolic verification with the natural language generation process.<n>We operationalize this framework via a novel two-stage training pipeline that synergizes formal logic verification-guided supervised fine-tuning and policy optimization.
arXiv Detail & Related papers (2026-01-30T07:01:25Z) - AgentsEval: Clinically Faithful Evaluation of Medical Imaging Reports via Multi-Agent Reasoning [73.50200033931148]
We introduce AgentsEval, a multi-agent stream reasoning framework that emulates the collaborative diagnostic workflow of radiologists.<n>By dividing the evaluation process into interpretable steps including criteria definition, evidence extraction, alignment, and consistency scoring, AgentsEval provides explicit reasoning traces and structured clinical feedback.<n> Experimental results demonstrate that AgentsEval delivers clinically aligned, semantically faithful, and interpretable evaluations that remain robust under paraphrastic, semantic, and stylistic perturbations.
arXiv Detail & Related papers (2026-01-23T11:59:13Z) - MedCEG: Reinforcing Verifiable Medical Reasoning with Critical Evidence Graph [17.320322032287894]
We propose MedCEG, a framework that augments medical language models with clinically valid reasoning pathways.<n>To guide the reasoning process, we introduce a Clinical Reasoning Procedure Reward.<n> Experimental results show that MedCEG surpasses existing methods in performance while producing clinically valid reasoning chains.
arXiv Detail & Related papers (2025-12-15T16:38:46Z) - Adaptive Diagnostic Reasoning Framework for Pathology with Multimodal Large Language Models [34.28963665009494]
We present RECAP-PATH, an interpretable framework that establishes a self-learning paradigm.<n>It shifts off-the-shelf multimodal large language models from passive pattern recognition to evidence-linked diagnostic reasoning.<n>This self-learning approach requires only small labeled sets and no white-box access or weight updates to generate cancer diagnoses.
arXiv Detail & Related papers (2025-11-15T03:06:59Z) - AGIR: Assessing 3D Gait Impairment with Reasoning based on LLMs [0.0]
gait impairment plays an important role in early diagnosis, disease monitoring, and treatment evaluation for neurodegenerative diseases.<n>Recent deep learning-based approaches have consistently improved classification accuracies, but they often lack interpretability.<n>We introduce AGIR, a novel pipeline consisting of a pre-trained VQ-VAE motion tokenizer and a Large Language Model (LLM) fine-tuned over pairs of motion tokens.
arXiv Detail & Related papers (2025-03-23T17:12:16Z) - Pitfalls of topology-aware image segmentation [81.19923502845441]
We identify critical pitfalls in model evaluation that include inadequate connectivity choices, overlooked topological artifacts, and inappropriate use of evaluation metrics.<n>We propose a set of actionable recommendations to establish fair and robust evaluation standards for topology-aware medical image segmentation methods.
arXiv Detail & Related papers (2024-12-19T08:11:42Z) - Neural Causal Models for Counterfactual Identification and Estimation [62.30444687707919]
We study the evaluation of counterfactual statements through neural models.
First, we show that neural causal models (NCMs) are expressive enough.
Second, we develop an algorithm for simultaneously identifying and estimating counterfactual distributions.
arXiv Detail & Related papers (2022-09-30T18:29:09Z) - Logical Satisfiability of Counterfactuals for Faithful Explanations in
NLI [60.142926537264714]
We introduce the methodology of Faithfulness-through-Counterfactuals.
It generates a counterfactual hypothesis based on the logical predicates expressed in the explanation.
It then evaluates if the model's prediction on the counterfactual is consistent with that expressed logic.
arXiv Detail & Related papers (2022-05-25T03:40:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.