AgentEHR: Advancing Autonomous Clinical Decision-Making via Retrospective Summarization
- URL: http://arxiv.org/abs/2601.13918v1
- Date: Tue, 20 Jan 2026 12:48:04 GMT
- Title: AgentEHR: Advancing Autonomous Clinical Decision-Making via Retrospective Summarization
- Authors: Yusheng Liao, Chuan Xuan, Yutong Cai, Lina Yang, Zhe Chen, Yanfeng Wang, Yu Wang,
- Abstract summary: RetroSum is a framework that unifies a retrospective summarization mechanism with an evolving experience strategy.<n>It achieves performance gains of up to 29.16% over competitive baselines, while significantly decreasing total interaction errors by up to 92.3%.
- Score: 28.032865974227875
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models have demonstrated profound utility in the medical domain. However, their application to autonomous Electronic Health Records~(EHRs) navigation remains constrained by a reliance on curated inputs and simplified retrieval tasks. To bridge the gap between idealized experimental settings and realistic clinical environments, we present AgentEHR. This benchmark challenges agents to execute complex decision-making tasks, such as diagnosis and treatment planning, requiring long-range interactive reasoning directly within raw and high-noise databases. In tackling these tasks, we identify that existing summarization methods inevitably suffer from critical information loss and fractured reasoning continuity. To address this, we propose RetroSum, a novel framework that unifies a retrospective summarization mechanism with an evolving experience strategy. By dynamically re-evaluating interaction history, the retrospective mechanism prevents long-context information loss and ensures unbroken logical coherence. Additionally, the evolving strategy bridges the domain gap by retrieving accumulated experience from a memory bank. Extensive empirical evaluations demonstrate that RetroSum achieves performance gains of up to 29.16% over competitive baselines, while significantly decreasing total interaction errors by up to 92.3%.
Related papers
- AgentLongBench: A Controllable Long Benchmark For Long-Contexts Agents via Environment Rollouts [78.33143446024485]
We introduce textbfAgentLongBench, which evaluates agents through simulated environment rollouts based on Lateral Thinking Puzzles.<n>This framework generates rigorous interaction trajectories across knowledge-intensive and knowledge-free scenarios.
arXiv Detail & Related papers (2026-01-28T16:05:44Z) - Strong Reasoning Isn't Enough: Evaluating Evidence Elicitation in Interactive Diagnosis [29.630872344186873]
Interactive medical consultation requires an agent to proactively elicit missing clinical evidence under uncertainty.<n>Existing evaluations largely remain static or outcome-centric, neglecting the evidence-gathering process.<n>We propose an interactive evaluation framework that explicitly models the consultation process using a simulated patient and a revsimulated reporter grounded in atomic evidences.
arXiv Detail & Related papers (2026-01-27T16:36:35Z) - Chunking, Retrieval, and Re-ranking: An Empirical Evaluation of RAG Architectures for Policy Document Question Answering [0.0]
The integration of Large Language Models (LLMs) into the public health policy sector offers a transformative approach to navigating the vast repositories of regulatory guidance maintained by agencies such as the Centers for Disease Control and Prevention (CDC)<n>The propensity for LLMs to generate hallucinations, defined as plausible but factually incorrect assertions, presents a critical barrier to the adoption of these technologies in high-stakes environments where information integrity is non-negotiable.<n>This empirical evaluation explores the effectiveness of Retrieval-Augmented Generation (RAG) architectures in mitigating these risks by grounding generative outputs in authoritative document context.
arXiv Detail & Related papers (2026-01-21T20:52:48Z) - Bio-inspired Agentic Self-healing Framework for Resilient Distributed Computing Continuum Systems [4.003029907200818]
ReCiSt is a bio-inspired agentic self-healing framework designed to achieve resilience in Distributed Computing Continuum Systems (DCCS)<n>ReCiSt reconstructs the biological phases of Hemostasis, Inflammation, Proliferation, and Remodeling into the computational layers Containment, Diagnosis, Meta-Cognitive, and Knowledge for DCCS.<n>These four layers perform autonomous fault isolation, causal diagnosis, adaptive recovery, and long-term knowledge consolidation through Language Model (LM)-powered agents.
arXiv Detail & Related papers (2026-01-01T13:30:38Z) - Thinking Forward and Backward: Multi-Objective Reinforcement Learning for Retrieval-Augmented Reasoning [137.33138614095435]
Retrieval-augmented generation (RAG) has proven to be effective in mitigating hallucinations in large language models.<n>Recent efforts have incorporated search-based interactions into RAG, enabling iterative reasoning with real-time retrieval.<n>We propose Bi-RAR, a novel retrieval-augmented reasoning framework that evaluates each intermediate step jointly in both forward and backward directions.
arXiv Detail & Related papers (2025-11-12T08:29:39Z) - IterResearch: Rethinking Long-Horizon Agents via Markovian State Reconstruction [107.49922328855025]
IterResearch is a novel iterative deep-research paradigm that reformulates long-horizon research as a Markov Decision Process.<n>It achieves substantial improvements over existing open-source agents with average +14.5pp across six benchmarks.<n>It serves as an effective prompting strategy, improving frontier models by up to 19.2pp over ReAct on long-horizon tasks.
arXiv Detail & Related papers (2025-11-10T17:30:08Z) - MedAlign: A Synergistic Framework of Multimodal Preference Optimization and Federated Meta-Cognitive Reasoning [52.064286116035134]
We develop MedAlign, a framework to ensure visually accurate LVLM responses for Medical Visual Question Answering (Med-VQA)<n>We first propose a multimodal Direct Preference Optimization (mDPO) objective to align preference learning with visual context.<n>We then design a Retrieval-Aware Mixture-of-Experts (RA-MoE) architecture that utilizes image and text similarity to route queries to a specialized and context-augmented LVLM.
arXiv Detail & Related papers (2025-10-24T02:11:05Z) - AgentMental: An Interactive Multi-Agent Framework for Explainable and Adaptive Mental Health Assessment [31.920800599579906]
Mental health assessment is crucial for early intervention and effective treatment, yet traditional clinician-based approaches are limited by the shortage of qualified professionals.<n>Recent advances in artificial intelligence have sparked growing interest in automated psychological assessment, yet most existing approaches are constrained by their reliance on static text analysis.<n>We propose a multi-agent framework for mental health evaluation that simulates clinical doctor-patient dialogues.
arXiv Detail & Related papers (2025-08-15T16:20:45Z) - mRAG: Elucidating the Design Space of Multi-modal Retrieval-Augmented Generation [15.991125806837386]
Large Vision-Language Models (LVLMs) have made remarkable strides in multimodal tasks such as visual question answering, visual grounding, and complex reasoning.<n>Retrieval-Augmented Generation (RAG) offers a practical solution to mitigate these challenges by allowing the LVLMs to access large-scale knowledge databases via retrieval mechanisms.
arXiv Detail & Related papers (2025-05-29T23:32:03Z) - Multi-modal Causal Structure Learning and Root Cause Analysis [67.67578590390907]
We propose Mulan, a unified multi-modal causal structure learning method for root cause localization.
We leverage a log-tailored language model to facilitate log representation learning, converting log sequences into time-series data.
We also introduce a novel key performance indicator-aware attention mechanism for assessing modality reliability and co-learning a final causal graph.
arXiv Detail & Related papers (2024-02-04T05:50:38Z) - Stateful Offline Contextual Policy Evaluation and Learning [88.9134799076718]
We study off-policy evaluation and learning from sequential data.
We formalize the relevant causal structure of problems such as dynamic personalized pricing.
We show improved out-of-sample policy performance in this class of relevant problems.
arXiv Detail & Related papers (2021-10-19T16:15:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.