Related papers: HARMON-E: Hierarchical Agentic Reasoning for Multimodal Oncology Notes to Extract Structured Data

HARMON-E: Hierarchical Agentic Reasoning for Multimodal Oncology Notes to Extract Structured Data

URL: http://arxiv.org/abs/2512.19864v2
Date: Fri, 26 Dec 2025 11:32:02 GMT
Title: HARMON-E: Hierarchical Agentic Reasoning for Multimodal Oncology Notes to Extract Structured Data
Authors: Shashi Kant Gupta, Arijeet Pramanik, Jerrin John Thomas, Regina Schwind, Lauren Wiener, Avi Raju, Jeremy Kornbluth, Yanshan Wang, Zhaohui Su, Hrituraj Singh,
Abstract summary: We propose an agentic framework that decomposes complex oncology data extraction into modular, adaptive tasks.<n> Evaluated on a large-scale dataset of over 400,000 unstructured clinical notes and scanned PDF reports spanning 2,250 cancer patients, our method achieves an average F1-score of 0.93.
Score: 4.776184995012808
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Unstructured notes within the electronic health record (EHR) contain rich clinical information vital for cancer treatment decision making and research, yet reliably extracting structured oncology data remains challenging due to extensive variability, specialized terminology, and inconsistent document formats. Manual abstraction, although accurate, is prohibitively costly and unscalable. Existing automated approaches typically address narrow scenarios - either using synthetic datasets, restricting focus to document-level extraction, or isolating specific clinical variables (e.g., staging, biomarkers, histology) - and do not adequately handle patient-level synthesis across the large number of clinical documents containing contradictory information. In this study, we propose an agentic framework that systematically decomposes complex oncology data extraction into modular, adaptive tasks. Specifically, we use large language models (LLMs) as reasoning agents, equipped with context-sensitive retrieval and iterative synthesis capabilities, to exhaustively and comprehensively extract structured clinical variables from real-world oncology notes. Evaluated on a large-scale dataset of over 400,000 unstructured clinical notes and scanned PDF reports spanning 2,250 cancer patients, our method achieves an average F1-score of 0.93, with 100 out of 103 oncology-specific clinical variables exceeding 0.85, and critical variables (e.g., biomarkers and medications) surpassing 0.95. Moreover, integration of the agentic system into a data curation workflow resulted in 0.94 direct manual approval rate, significantly reducing annotation costs. To our knowledge, this constitutes the first exhaustive, end-to-end application of LLM-based agents for structured oncology data extraction at scale

Related papers

CNSight: Evaluation of Clinical Note Segmentation Tools [3.673249612734457]
We evaluate rule-based baselines, domain-specific transformer models, and large language models for clinical note segmentation using a curated dataset of 1,000 notes from MIMIC-IV.<n>Our experiments show that large API-based models achieve the best overall performance, with GPT-5-mini reaching a best average F1 of 72.4 across sentence-level and freetext segmentation.
arXiv Detail & Related papers (2025-12-28T05:40:15Z)
Improving Cardiac Risk Prediction Using Data Generation Techniques [37.94487163156369]
This work proposes an architecture for the synthesis of realistic clinical records that are coherent with real-world observations.<n>The primary objective is to increase the size and diversity of the available datasets in order to enhance the performance of cardiac risk prediction models.
arXiv Detail & Related papers (2025-12-19T10:17:00Z)
Leveraging LLMs for Structured Data Extraction from Unstructured Patient Records [0.0]
Manual chart review remains an extremely time-consuming and resource-intensive component of clinical research.<n>We present a framework for automated structured feature extraction from clinical notes leveraging locally deployed large language models (LLMs)<n>This framework demonstrates the potential of LLM systems to reduce the burden of manual chart review and increase consistency in data capture.
arXiv Detail & Related papers (2025-12-03T14:10:12Z)
Enhancing Lung Cancer Treatment Outcome Prediction through Semantic Feature Engineering Using Large Language Models [5.778370321351782]
We introduce a framework that uses Large Language Models (LLMs) as Goal-oriented Knowledge Curators (GKC)<n>GKC converts laboratory, genomic, and medication data into high-fidelity, task-aligned features.<n>We benchmarked GKC against expert-engineered features, direct text embeddings, and an end-to-end transformer.
arXiv Detail & Related papers (2025-12-01T23:56:45Z)
Improving the Generation and Evaluation of Synthetic Data for Downstream Medical Causal Inference [89.5628648718851]
Causal inference is essential for developing and evaluating medical interventions.<n>Real-world medical datasets are often difficult to access due to regulatory barriers.<n>We present STEAM: a novel method for generating Synthetic data for Treatment Effect Analysis in Medicine.
arXiv Detail & Related papers (2025-10-21T16:16:00Z)
Clinically-guided Data Synthesis for Laryngeal Lesion Detection [2.573786844054239]
This study introduces a novel approach that exploits a Latent Diffusion Model (LDM) coupled with a ControlNet adapter to generate laryngeal endoscopic image-annotation pairs.<n>The proposed approach can be leveraged to expand training datasets for CADx/e models, empowering the assessment process in laryngology.
arXiv Detail & Related papers (2025-08-08T09:55:54Z)
Clinical NLP with Attention-Based Deep Learning for Multi-Disease Prediction [44.0876796031468]
This paper addresses the challenges posed by the unstructured nature and high-dimensional semantic complexity of electronic health record texts.<n>A deep learning method based on attention mechanisms is proposed to achieve unified modeling for information extraction and multi-label disease prediction.
arXiv Detail & Related papers (2025-07-02T07:45:22Z)
TrialMatchAI: An End-to-End AI-powered Clinical Trial Recommendation System to Streamline Patient-to-Trial Matching [0.0]
We present TrialMatchAI, an AI-powered recommendation system that automates patient-to-trial matching.<n>Built on fine-tuned, open-source large language models, TrialMatchAI ensures transparency and maintains a lightweight deployment footprint.<n>In real-world validation, 92 percent of oncology patients had at least one relevant trial retrieved within the top 20 recommendations.
arXiv Detail & Related papers (2025-05-13T12:39:06Z)
PathOrchestra: A Comprehensive Foundation Model for Computational Pathology with Over 100 Diverse Clinical-Grade Tasks [39.97710183184273]
We present PathOrchestra, a versatile pathology foundation model trained via self-supervised learning on a dataset comprising 300K pathological slides.<n>The model was rigorously evaluated on 112 clinical tasks using a combination of 61 private and 51 public datasets.<n>PathOrchestra demonstrated exceptional performance across 27,755 WSIs and 9,415,729 ROIs, achieving over 0.950 accuracy in 47 tasks.
arXiv Detail & Related papers (2025-03-31T17:28:02Z)
Towards Scalable and Cross-Lingual Specialist Language Models for Oncology [4.824906329042275]
General-purpose large models (LLMs) struggle with challenges such as clinical terminology, context-dependent interpretations, and multi-modal data integration.<n>We develop an oncology-specialized, efficient, and adaptable NLP framework that combines instruction tuning, retrieval-augmented generation (RAG), and graph-based knowledge integration.
arXiv Detail & Related papers (2025-03-11T11:34:57Z)
Multimodal Pretraining of Medical Time Series and Notes [45.89025874396911]
Deep learning models show promise in extracting meaningful patterns, but they require extensive labeled data. We propose a novel approach employing self-supervised pretraining, focusing on the alignment of clinical measurements and notes. In downstream tasks, including in-hospital mortality prediction and phenotyping, our model outperforms baselines in settings where only a fraction of the data is labeled.
arXiv Detail & Related papers (2023-12-11T21:53:40Z)
Towards Unifying Anatomy Segmentation: Automated Generation of a Full-body CT Dataset via Knowledge Aggregation and Anatomical Guidelines [113.08940153125616]
We generate a dataset of whole-body CT scans with $142$ voxel-level labels for 533 volumes providing comprehensive anatomical coverage. Our proposed procedure does not rely on manual annotation during the label aggregation stage. We release our trained unified anatomical segmentation model capable of predicting $142$ anatomical structures on CT data.
arXiv Detail & Related papers (2023-07-25T09:48:13Z)
A Multimodal Transformer: Fusing Clinical Notes with Structured EHR Data for Interpretable In-Hospital Mortality Prediction [8.625186194860696]
We provide a novel multimodal transformer to fuse clinical notes and structured EHR data for better prediction of in-hospital mortality. To improve interpretability, we propose an integrated gradients (IG) method to select important words in clinical notes. We also investigate the significance of domain adaptive pretraining and task adaptive fine-tuning on the Clinical BERT.
arXiv Detail & Related papers (2022-08-09T03:49:52Z)
CODE-AE: A Coherent De-confounding Autoencoder for Predicting Patient-Specific Drug Response From Cell Line Transcriptomics [35.67979269269178]
We develop a Coherent Deconfounding Autoencoder (CODE-AE) that can extract both common biological signals shared by incoherent samples and private representations unique to each data set. CODE-AE significantly improves the accuracy and robustness over state-of-the-art methods in both predicting patient drug response and de-confounding biological signals.
arXiv Detail & Related papers (2021-01-31T21:17:44Z)
Collaborative residual learners for automatic icd10 prediction using prescribed medications [45.82374977939355]
We propose a novel collaborative residual learning based model to automatically predict ICD10 codes employing only prescriptions data. We obtain multi-label classification accuracy of 0.71 and 0.57 of average precision, 0.57 and 0.38 of F1-score and 0.73 and 0.44 of accuracy in predicting principal diagnosis for inpatient and outpatient datasets respectively.
arXiv Detail & Related papers (2020-12-16T07:07:27Z)
Ensemble model for pre-discharge icd10 coding prediction [45.82374977939355]
We propose an ensemble model incorporating multiple clinical data sources for accurate code predictions. We obtain multi-label classification accuracies of 0.73 and 0.58 for average precision, 0.56 and 0.35 for F1-scores and 0.71 and 0.4 accuracy in predicting principal diagnosis for inpatient and outpatient datasets respectively.
arXiv Detail & Related papers (2020-12-16T07:02:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.