Related papers: Prior-informed optimization of treatment recommendation via bandit algorithms trained on large language model-processed historical records

Prior-informed optimization of treatment recommendation via bandit algorithms trained on large language model-processed historical records

URL: http://arxiv.org/abs/2510.19014v1
Date: Tue, 21 Oct 2025 18:57:00 GMT
Title: Prior-informed optimization of treatment recommendation via bandit algorithms trained on large language model-processed historical records
Authors: Saman Nessari, Ali Bozorgi-Amiri,
Abstract summary: Current medical practice depends on standardized treatment frameworks and empirical methodologies that neglect individual patient variations.<n>We develop a comprehensive system integrating Large Language Models (LLMs), Conditional Tabular Generative Adversarial Networks (CTGAN), T-learner counterfactual models, and contextual bandit approaches.
Score: 0.6875312133832079
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Current medical practice depends on standardized treatment frameworks and empirical methodologies that neglect individual patient variations, leading to suboptimal health outcomes. We develop a comprehensive system integrating Large Language Models (LLMs), Conditional Tabular Generative Adversarial Networks (CTGAN), T-learner counterfactual models, and contextual bandit approaches to provide customized, data-informed clinical recommendations. The approach utilizes LLMs to process unstructured medical narratives into structured datasets (93.2% accuracy), uses CTGANs to produce realistic synthetic patient data (55% accuracy via two-sample verification), deploys T-learners to forecast patient-specific treatment responses (84.3% accuracy), and integrates prior-informed contextual bandits to enhance online therapeutic selection by effectively balancing exploration of new possibilities with exploitation of existing knowledge. Testing on stage III colon cancer datasets revealed that our KernelUCB approach obtained 0.60-0.61 average reward scores across 5,000 rounds, exceeding other reference methods. This comprehensive system overcomes cold-start limitations in online learning environments, improves computational effectiveness, and constitutes notable progress toward individualized medicine adapted to specific patient characteristics.

Related papers

Adaptive Identification and Modeling of Clinical Pathways with Process Mining [4.810514867998534]
Clinical pathways are specialized healthcare plans that model patient treatment procedures.<n>We propose a two-phase modeling method using process mining.<n>We demonstrate our approach using Synthea, a benchmark dataset simulating patient treatments for SARS-CoV-2 infections.
arXiv Detail & Related papers (2025-12-03T13:37:37Z)
Enhancing Lung Cancer Treatment Outcome Prediction through Semantic Feature Engineering Using Large Language Models [5.778370321351782]
We introduce a framework that uses Large Language Models (LLMs) as Goal-oriented Knowledge Curators (GKC)<n>GKC converts laboratory, genomic, and medication data into high-fidelity, task-aligned features.<n>We benchmarked GKC against expert-engineered features, direct text embeddings, and an end-to-end transformer.
arXiv Detail & Related papers (2025-12-01T23:56:45Z)
Evolving Diagnostic Agents in a Virtual Clinical Environment [75.59389103511559]
We present a framework for training large language models (LLMs) as diagnostic agents with reinforcement learning.<n>Our method acquires diagnostic strategies through interactive exploration and outcome-based feedback.<n>DiagAgent significantly outperforms 10 state-of-the-art LLMs, including DeepSeek-v3 and GPT-4o.
arXiv Detail & Related papers (2025-10-28T17:19:47Z)
Timely Clinical Diagnosis through Active Test Selection [49.091903570068155]
We propose ACTMED (Adaptive Clinical Test selection via Model-based Experimental Design) to better emulate real-world diagnostic reasoning.<n>LLMs act as flexible simulators, generating plausible patient state distributions and supporting belief updates without requiring structured, task-specific training data.<n>We evaluate ACTMED on real-world datasets and show it can optimize test selection to improve diagnostic accuracy, interpretability, and resource use.
arXiv Detail & Related papers (2025-10-21T18:10:45Z)
CancerGUIDE: Cancer Guideline Understanding via Internal Disagreement Estimation [18.396334867873307]
National Comprehensive Cancer Network (NCCN) provides evidence-based guidelines for cancer treatment.<n>Translating complex patient presentations into guideline-compliant treatment recommendations is time-intensive, requires specialized expertise, and is prone to error.<n>We present an agent-based approach to automatically generate guideline-concordant treatment trajectories for patients with non-small cell lung cancer.
arXiv Detail & Related papers (2025-09-09T01:49:29Z)
Censoring-Aware Tree-Based Reinforcement Learning for Estimating Dynamic Treatment Regimes with Censored Outcomes [4.877686100899469]
Censoring-Aware Tree-Based Reinforcement Learning (CA-TRL) is a novel framework to address the complexities associated with censored data.<n>We demonstrate its effectiveness through extensive simulations and real-world applications using the SANAD epilepsy dataset.
arXiv Detail & Related papers (2025-03-09T16:53:09Z)
Continually Evolved Multimodal Foundation Models for Cancer Prognosis [50.43145292874533]
Cancer prognosis is a critical task that involves predicting patient outcomes and survival rates.<n>Previous studies have integrated diverse data modalities, such as clinical notes, medical images, and genomic data, leveraging their complementary information.<n>Existing approaches face two major limitations. First, they struggle to incorporate newly arrived data with varying distributions into training, such as patient records from different hospitals.<n>Second, most multimodal integration methods rely on simplistic concatenation or task-specific pipelines, which fail to capture the complex interdependencies across modalities.
arXiv Detail & Related papers (2025-01-30T06:49:57Z)
Clear Preferences Leave Traces: Reference Model-Guided Sampling for Preference Learning [59.11519451499754]
Direct Preference Optimization (DPO) has emerged as a de-facto approach for aligning language models with human preferences.<n>Recent work has shown DPO's effectiveness relies on training data quality.<n>We discover that reference model probability space naturally detects high-quality training samples.
arXiv Detail & Related papers (2025-01-25T07:21:50Z)
Clinical information extraction for Low-resource languages with Few-shot learning using Pre-trained language models and Prompting [12.166472806042592]
Automatic extraction of medical information from clinical documents poses several challenges. Recent advances in domain-adaptation and prompting methods showed promising results with minimal training data. We demonstrate that a lightweight, domain-adapted pretrained model, prompted with just 20 shots, outperforms a traditional classification model by 30.5% accuracy.
arXiv Detail & Related papers (2024-03-20T08:01:33Z)
FineEHR: Refine Clinical Note Representations to Improve Mortality Prediction [3.9026461169566673]
Large-scale electronic health records provide machine learning models with an abundance of clinical text and vital sign data. Despite the emergence of advanced Natural Language Processing (NLP) algorithms for clinical note analysis, the complex textual structure and noise present in raw clinical data have posed significant challenges. We propose FINEEHR, a system that utilizes two representation learning techniques, namely metric learning and fine-tuning, to refine clinical note embeddings.
arXiv Detail & Related papers (2023-04-24T02:42:52Z)
Large Language Models for Healthcare Data Augmentation: An Example on Patient-Trial Matching [49.78442796596806]
We propose an innovative privacy-aware data augmentation approach for patient-trial matching (LLM-PTM) Our experiments demonstrate a 7.32% average improvement in performance using the proposed LLM-PTM method, and the generalizability to new data is improved by 12.12%.
arXiv Detail & Related papers (2023-03-24T03:14:00Z)
Clinical Outcome Prediction from Admission Notes using Self-Supervised Knowledge Integration [55.88616573143478]
Outcome prediction from clinical text can prevent doctors from overlooking possible risks. Diagnoses at discharge, procedures performed, in-hospital mortality and length-of-stay prediction are four common outcome prediction targets. We propose clinical outcome pre-training to integrate knowledge about patient outcomes from multiple public sources.
arXiv Detail & Related papers (2021-02-08T10:26:44Z)
DTR Bandit: Learning to Make Response-Adaptive Decisions With Low Regret [59.81290762273153]
Dynamic treatment regimes (DTRs) are personalized, adaptive, multi-stage treatment plans that adapt treatment decisions to an individual's initial features and to intermediate outcomes and features at each subsequent stage. We propose a novel algorithm that, by carefully balancing exploration and exploitation, is guaranteed to achieve rate-optimal regret when the transition and reward models are linear.
arXiv Detail & Related papers (2020-05-06T13:03:42Z)
Language Models Are An Effective Patient Representation Learning Technique For Electronic Health Record Data [7.260199064831896]
We show that patient representation schemes inspired from techniques in natural language processing can increase the accuracy of clinical prediction models. Such patient representation schemes enable a 3.5% mean improvement in AUROC on five prediction tasks compared to standard baselines.
arXiv Detail & Related papers (2020-01-06T22:24:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.