A Causal Machine Learning Framework for Treatment Personalization in Clinical Trials: Application to Ulcerative Colitis
- URL: http://arxiv.org/abs/2602.08171v1
- Date: Mon, 09 Feb 2026 00:26:30 GMT
- Title: A Causal Machine Learning Framework for Treatment Personalization in Clinical Trials: Application to Ulcerative Colitis
- Authors: Cristian Minoccheri, Sophia Tesic, Kayvan Najarian, Ryan Stidham,
- Abstract summary: We present a modular causal machine learning framework that evaluates each question separately.<n>We apply this framework to patient-level data from the UNIFI maintenance trial of ustekinumab in ulcerative colitis.
- Score: 0.7799711162530713
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Randomized controlled trials estimate average treatment effects, but treatment response heterogeneity motivates personalized approaches. A critical question is whether statistically detectable heterogeneity translates into improved treatment decisions -- these are distinct questions that can yield contradictory answers. We present a modular causal machine learning framework that evaluates each question separately: permutation importance identifies which features predict heterogeneity, best linear predictor (BLP) testing assesses statistical significance, and doubly robust policy evaluation measures whether acting on the heterogeneity improves patient outcomes. We apply this framework to patient-level data from the UNIFI maintenance trial of ustekinumab in ulcerative colitis, comparing placebo, standard-dose ustekinumab every 12 weeks, and dose-intensified ustekinumab every 8 weeks, using cross-fitted X-learner models with baseline demographics, medication history, week-8 clinical scores, laboratory biomarkers, and video-derived endoscopic features. BLP testing identified strong associations between endoscopic features and treatment effect heterogeneity for ustekinumab versus placebo, yet doubly robust policy evaluation showed no improvement in expected remission from incorporating endoscopic features, and out-of-fold multi-arm evaluation showed worse performance. Diagnostic comparison of prognostic contribution against policy value revealed that endoscopic scores behaved as disease severity markers -- improving outcome prediction in untreated patients but adding noise to treatment selection -- while clinical variables (fecal calprotectin, age, CRP) captured the decision-relevant variation. These results demonstrate that causal machine learning applications to clinical trials should include policy-level evaluation alongside heterogeneity testing.
Related papers
- Identifying and Characterising Response in Clinical Trials: Development and Validation of a Machine Learning Approach in Colorectal Cancer [0.45835414225547183]
Precision medicine promises to transform health care by offering individualised treatments that dramatically improve clinical outcomes.<n>Current approaches are limited to static measures of treatment success, neglecting the repeated measures found in most clinical trials.<n>Our approach combines the concept of partly conditional modelling with treatment effect estimation based on the Virtual Twins method.<n>Performance was evaluated using synthetic data and applied to clinical trials examining the effectiveness of panitumumab to treat metastatic colorectal cancer.
arXiv Detail & Related papers (2026-02-28T18:00:26Z) - Testing Effect Homogeneity and Confounding in High-Dimensional Experimental and Observational Studies [0.552480439325792]
We propose a framework for testing the homogeneity of conditional average treatment effects (CATEs) across multiple experimental and observational studies.<n>Our approach leverages multiple randomized trials to assess whether treatment effects vary with unobserved heterogeneity that differs across trials.
arXiv Detail & Related papers (2026-02-23T10:52:39Z) - Investigating the Impact of Histopathological Foundation Models on Regressive Prediction of Homologous Recombination Deficiency [52.50039435394964]
We systematically evaluate foundation models for regression-based tasks.<n>We extract patch-level features from whole slide images (WSI) using five state-of-the-art foundation models.<n>Models are trained to predict continuous HRD scores based on these extracted features across breast, endometrial, and lung cancer cohorts.
arXiv Detail & Related papers (2026-01-29T14:06:50Z) - Assessing the robustness of heterogeneous treatment effects in survival analysis under informative censoring [50.164756034797136]
Dropout is common in clinical studies, with up to half of patients leaving early due to side effects or other reasons.<n>When dropout is informative, it introduces censoring bias, because of which treatment effect estimates are also biased.<n>We propose an assumption-lean framework to assess the robustness of conditional average treatment effect estimates in survival analysis when facing censoring bias.
arXiv Detail & Related papers (2025-10-15T10:51:17Z) - The Framework That Survives Bad Models: Human-AI Collaboration For Clinical Trials [2.6377299508948746]
Using AI as a supporting reader (AI-SR) is the most suitable approach for clinical trials, as it meets all criteria across various model types, even with bad models.<n>This method consistently provides reliable disease estimation, preserves clinical trial treatment effect estimates and conclusions, and retains these advantages when applied to different populations.
arXiv Detail & Related papers (2025-10-08T01:40:41Z) - An Oversampling-enhanced Multi-class Imbalanced Classification Framework for Patient Health Status Prediction Using Patient-reported Outcomes [6.075416560330067]
Patient-reported outcomes (PROs) directly collected from cancer patients being treated with radiation therapy play a vital role in assisting clinicians in counseling patients regarding likely toxicities.
We explore various machine learning techniques to predict patient outcomes related to health status using PROBoost from a cancer photon/proton therapy center.
arXiv Detail & Related papers (2024-11-16T14:54:18Z) - Detecting critical treatment effect bias in small subgroups [11.437076464287822]
We propose a novel strategy to benchmark observational studies beyond the average treatment effect.
First, we design a statistical test for the null hypothesis that the treatment effects estimated from the two studies, conditioned on a set of relevant features, differ up to some tolerance.
We then estimate anally valid lower bound on the maximum bias strength for any subgroup in the observational study.
arXiv Detail & Related papers (2024-04-29T17:44:28Z) - Benchmarking Heterogeneous Treatment Effect Models through the Lens of
Interpretability [82.29775890542967]
Estimating personalized effects of treatments is a complex, yet pervasive problem.
Recent developments in the machine learning literature on heterogeneous treatment effect estimation gave rise to many sophisticated, but opaque, tools.
We use post-hoc feature importance methods to identify features that influence the model's predictions.
arXiv Detail & Related papers (2022-06-16T17:59:05Z) - HINT: Hierarchical Interaction Network for Trial Outcome Prediction
Leveraging Web Data [56.53715632642495]
Clinical trials face uncertain outcomes due to issues with efficacy, safety, or problems with patient recruitment.
In this paper, we propose Hierarchical INteraction Network (HINT) for more general, clinical trial outcome predictions.
arXiv Detail & Related papers (2021-02-08T15:09:07Z) - A standardized framework for risk-based assessment of treatment effect
heterogeneity in observational healthcare databases [60.07352590494571]
The aim of this study was to extend this approach to the observational setting using a standardized scalable framework.
We demonstrate our framework by evaluating the effect of angiotensin-converting enzyme (ACE) inhibitors versus beta blockers on three efficacy and six safety outcomes.
arXiv Detail & Related papers (2020-10-13T14:48:31Z) - Survival Analysis Using a 5-Step Stratified Testing and Amalgamation
Routine in Randomized Clinical Trials [0.0]
Increased patient heterogeneity can weaken the ability of common statistical approaches to detect treatment differences.
A list of baseline covariates that have the potential to be prognostic for survival under either treatment is pre-specified.
A conditional inference tree algorithm is used to segment the heterogeneous trial population into subpopulations of prognostically homogeneous patients.
The impressive power-boosting performance of our proposed 5-step stratified testing and amalgamation routine (5-STAR) is illustrated.
arXiv Detail & Related papers (2020-04-28T15:44:41Z) - Generalization Bounds and Representation Learning for Estimation of
Potential Outcomes and Causal Effects [61.03579766573421]
We study estimation of individual-level causal effects, such as a single patient's response to alternative medication.
We devise representation learning algorithms that minimize our bound, by regularizing the representation's induced treatment group distance.
We extend these algorithms to simultaneously learn a weighted representation to further reduce treatment group distances.
arXiv Detail & Related papers (2020-01-21T10:16:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.