Demo: Healthcare Agent Orchestrator (HAO) for Patient Summarization in Molecular Tumor Boards
- URL: http://arxiv.org/abs/2509.06602v1
- Date: Mon, 08 Sep 2025 12:15:53 GMT
- Title: Demo: Healthcare Agent Orchestrator (HAO) for Patient Summarization in Molecular Tumor Boards
- Authors: Noel Codella, Sam Preston, Hao Qiu, Leonardo Schettini, Wen-wai Yim, Mert Öz, Shrey Jain, Matthew P. Lungren, Thomas Osborne,
- Abstract summary: We introduce the Healthcare Agent Orchestrator (HAO) to generate accurate and comprehensive patient summaries for Molecular Tumor Boards (MTBs)<n> evaluating predicted patient summaries against ground truth presents additional challenges due to stylistic variation, ordering, synonym usage, and phrasing differences.<n>We propose TBFact, a model-as-a-judge'' framework designed to assess the comprehensiveness and succinctness of generated summaries.
- Score: 8.410395208222537
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Molecular Tumor Boards (MTBs) are multidisciplinary forums where oncology specialists collaboratively assess complex patient cases to determine optimal treatment strategies. A central element of this process is the patient summary, typically compiled by a medical oncologist, radiation oncologist, or surgeon, or their trained medical assistant, who distills heterogeneous medical records into a concise narrative to facilitate discussion. This manual approach is often labor-intensive, subjective, and prone to omissions of critical information. To address these limitations, we introduce the Healthcare Agent Orchestrator (HAO), a Large Language Model (LLM)-driven AI agent that coordinates a multi-agent clinical workflow to generate accurate and comprehensive patient summaries for MTBs. Evaluating predicted patient summaries against ground truth presents additional challenges due to stylistic variation, ordering, synonym usage, and phrasing differences, which complicate the measurement of both succinctness and completeness. To overcome these evaluation hurdles, we propose TBFact, a ``model-as-a-judge'' framework designed to assess the comprehensiveness and succinctness of generated summaries. Using a benchmark dataset derived from de-identified tumor board discussions, we applied TBFact to evaluate our Patient History agent. Results show that the agent captured 94% of high-importance information (including partial entailments) and achieved a TBFact recall of 0.84 under strict entailment criteria. We further demonstrate that TBFact enables a data-free evaluation framework that institutions can deploy locally without sharing sensitive clinical data. Together, HAO and TBFact establish a robust foundation for delivering reliable and scalable support to MTBs.
Related papers
- MedDialogRubrics: A Comprehensive Benchmark and Evaluation Framework for Multi-turn Medical Consultations in Large Language Models [15.91764739198419]
We present MedDialogRubrics, a novel benchmark comprising 5,200 synthetically constructed patient cases and over 60,000 fine-grained evaluation rubrics.<n>Our framework employs a multi-agent system to synthesize realistic patient records and chief complaints without accessing real-world electronic health records.
arXiv Detail & Related papers (2026-01-06T13:56:33Z) - DispatchMAS: Fusing taxonomy and artificial intelligence agents for emergency medical services [49.70819009392778]
Large Language Models (LLMs) and Multi-Agent Systems (MAS) offer opportunities to augment dispatchers.<n>This study aimed to develop and evaluate a taxonomy-grounded, multi-agent system for simulating realistic scenarios.
arXiv Detail & Related papers (2025-10-24T08:01:21Z) - MedKGEval: A Knowledge Graph-Based Multi-Turn Evaluation Framework for Open-Ended Patient Interactions with Clinical LLMs [19.12790150016383]
We present MedKGEval, a novel multi-turn evaluation framework for clinical large language models.<n>A knowledge graph-driven patient simulation mechanism retrieves relevant medical facts from a curated knowledge graph.<n>A turn-level evaluation framework assesses each model response for clinical appropriateness, factual correctness, and safety.
arXiv Detail & Related papers (2025-10-14T07:22:26Z) - Automated Clinical Problem Detection from SOAP Notes using a Collaborative Multi-Agent LLM Architecture [8.072932739333309]
We introduce a collaborative multi-agent system (MAS) that models a clinical consultation team to address this gap.<n>The system is tasked with identifying clinical problems by analyzing only the Subjective (S) and Objective (O) sections of SOAP notes.<n>A Manager agent orchestrates a dynamically assigned team of specialist agents who engage in a hierarchical, iterative debate to reach a consensus.
arXiv Detail & Related papers (2025-08-29T17:31:24Z) - From EMR Data to Clinical Insight: An LLM-Driven Framework for Automated Pre-Consultation Questionnaire Generation [9.269061009613033]
We propose a novel framework for generating pre-consultation questionnaires from complex Electronic Medical Records (EMRs)<n>This framework overcomes limitations of direct methods by building explicit clinical knowledge.<n> Evaluated on a real-world EMR dataset and validated by clinical experts, our method demonstrates superior performance in information coverage, diagnostic relevance, understandability, and generation time.
arXiv Detail & Related papers (2025-08-01T12:24:49Z) - Comparing representations of long clinical texts for the task of patient note-identification [4.552065156611815]
Patient-note identification involves accurately matching an anonymized clinical note to its corresponding patient, represented by a set of related notes.<n>We explore various embedding methods, including BERT-based models, to process mediumto-long clinical texts effectively.<n>Our results indicate that BERT-based embeddings outperform traditional and hierarchical models, particularly in processing lengthy clinical notes.
arXiv Detail & Related papers (2025-03-31T12:31:44Z) - TAMA: A Human-AI Collaborative Thematic Analysis Framework Using Multi-Agent LLMs for Clinical Interviews [54.35097932763878]
Thematic analysis (TA) is a widely used qualitative approach for uncovering latent meanings in unstructured text data.<n>Here, we propose TAMA: A Human-AI Collaborative Thematic Analysis framework using Multi-Agent LLMs for clinical interviews.<n>We demonstrate that TAMA outperforms existing LLM-assisted TA approaches, achieving higher thematic hit rate, coverage, and distinctiveness.
arXiv Detail & Related papers (2025-03-26T15:58:16Z) - AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator [69.51568871044454]
We introduce textbfAI Hospital, a framework simulating dynamic medical interactions between emphDoctor as player and NPCs.
This setup allows for realistic assessments of LLMs in clinical scenarios.
We develop the Multi-View Medical Evaluation benchmark, utilizing high-quality Chinese medical records and NPCs.
arXiv Detail & Related papers (2024-02-15T06:46:48Z) - XAI for In-hospital Mortality Prediction via Multimodal ICU Data [57.73357047856416]
We propose an efficient, explainable AI solution for predicting in-hospital mortality via multimodal ICU data.
We employ multimodal learning in our framework, which can receive heterogeneous inputs from clinical data and make decisions.
Our framework can be easily transferred to other clinical tasks, which facilitates the discovery of crucial factors in healthcare research.
arXiv Detail & Related papers (2023-12-29T14:28:04Z) - Towards Unifying Anatomy Segmentation: Automated Generation of a
Full-body CT Dataset via Knowledge Aggregation and Anatomical Guidelines [113.08940153125616]
We generate a dataset of whole-body CT scans with $142$ voxel-level labels for 533 volumes providing comprehensive anatomical coverage.
Our proposed procedure does not rely on manual annotation during the label aggregation stage.
We release our trained unified anatomical segmentation model capable of predicting $142$ anatomical structures on CT data.
arXiv Detail & Related papers (2023-07-25T09:48:13Z) - Generating medically-accurate summaries of patient-provider dialogue: A
multi-stage approach using large language models [6.252236971703546]
An effective summary is required to be coherent and accurately capture all the medically relevant information in the dialogue.
This paper tackles the problem of medical conversation summarization by discretizing the task into several smaller dialogue-understanding tasks.
arXiv Detail & Related papers (2023-05-10T08:48:53Z) - Transformer-based unsupervised patient representation learning based on
medical claims for risk stratification and analysis [3.5492837081144204]
Transformer-based Multimodal AutoEncoder (TMAE) can learn efficient patient representation by encoding meaningful information from the claims data.
We trained TMAE using a real-world pediatric claims dataset containing more than 600,000 patients.
arXiv Detail & Related papers (2021-06-23T21:29:50Z) - DeepEnroll: Patient-Trial Matching with Deep Embedding and Entailment
Prediction [67.91606509226132]
Clinical trials are essential for drug development but often suffer from expensive, inaccurate and insufficient patient recruitment.
DeepEnroll is a cross-modal inference learning model to jointly encode enrollment criteria (tabular data) into a shared latent space for matching inference.
arXiv Detail & Related papers (2020-01-22T17:51:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.