Developing Fairness-Aware Task Decomposition to Improve Equity in Post-Spinal Fusion Complication Prediction
- URL: http://arxiv.org/abs/2512.00598v1
- Date: Sat, 29 Nov 2025 19:06:07 GMT
- Title: Developing Fairness-Aware Task Decomposition to Improve Equity in Post-Spinal Fusion Complication Prediction
- Authors: Yining Yuan, J. Ben Tamo, Wenqi Shi, Yishan Zhong, Micky C. Nnamdi, B. Randall Brenn, Steven W. Hwang, May D. Wang,
- Abstract summary: We propose a fairness-aware multitask learning framework for postoperative complication prediction.<n> FAIR-MTL employs a data-driven subgroup inference mechanism.<n>It achieves an AUC of 0.86 and an accuracy of 75%, outperforming single-task baselines while substantially reducing bias.
- Score: 3.860970992977915
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fairness in clinical prediction models remains a persistent challenge, particularly in high-stakes applications such as spinal fusion surgery for scoliosis, where patient outcomes exhibit substantial heterogeneity. Many existing fairness approaches rely on coarse demographic adjustments or post-hoc corrections, which fail to capture the latent structure of clinical populations and may unintentionally reinforce bias. We propose FAIR-MTL, a fairness-aware multitask learning framework designed to provide equitable and fine-grained prediction of postoperative complication severity. Instead of relying on explicit sensitive attributes during model training, FAIR-MTL employs a data-driven subgroup inference mechanism. We extract a compact demographic embedding, and apply k-means clustering to uncover latent patient subgroups that may be differentially affected by traditional models. These inferred subgroup labels determine task routing within a shared multitask architecture. During training, subgroup imbalance is mitigated through inverse-frequency weighting, and regularization prevents overfitting to smaller groups. Applied to postoperative complication prediction with four severity levels, FAIR-MTL achieves an AUC of 0.86 and an accuracy of 75%, outperforming single-task baselines while substantially reducing bias. For gender, the demographic parity difference decreases to 0.055 and equalized odds to 0.094; for age, these values reduce to 0.056 and 0.148, respectively. Model interpretability is ensured through SHAP and Gini importance analyses, which consistently highlight clinically meaningful predictors such as hemoglobin, hematocrit, and patient weight. Our findings show that incorporating unsupervised subgroup discovery into a multitask framework enables more equitable, interpretable, and clinically actionable predictions for surgical risk stratification.
Related papers
- A Data-Driven Approach to Support Clinical Renal Replacement Therapy [1.7666791716676549]
This study investigates a data-driven machine learning approach to predict membrane fouling in critically ill patients undergoing Continuous Renal Replacement Therapy (CRRT)<n>Using time-series data from an ICU, 16 clinically selected features were identified to train predictive models.<n>Results remained robust across different forecasting horizons.
arXiv Detail & Related papers (2026-02-26T11:47:22Z) - Detecting and Mitigating Group Bias in Heterogeneous Treatment Effects [28.4891545570248]
We develop a statistical framework to detect and mitigate group bias in randomized experiments.<n>For mitigation, we propose a shrinkage-based bias-correction, and show that the theoretically optimal and empirically feasible solutions have closed-form expressions.<n>We analyze the economic implications of mitigating detected group bias for profit-maximizing personalized targeting.
arXiv Detail & Related papers (2026-02-23T21:47:01Z) - Investigating the Impact of Histopathological Foundation Models on Regressive Prediction of Homologous Recombination Deficiency [52.50039435394964]
We systematically evaluate foundation models for regression-based tasks.<n>We extract patch-level features from whole slide images (WSI) using five state-of-the-art foundation models.<n>Models are trained to predict continuous HRD scores based on these extracted features across breast, endometrial, and lung cancer cohorts.
arXiv Detail & Related papers (2026-01-29T14:06:50Z) - Case Prompting to Mitigate Large Language Model Bias for ICU Mortality Prediction [17.91443453604627]
Large language models (LLMs) show promise in predicting outcomes from structured medical data.<n>LLMs may exhibit demographic biases related to sex, age, and race, limiting their trustworthy use in clinical practice.<n>We propose a training-free, clinically adaptive prompting framework to simultaneously improve fairness and performance.
arXiv Detail & Related papers (2025-12-17T12:29:53Z) - Overlap-weighted orthogonal meta-learner for treatment effect estimation over time [90.46786193198744]
We introduce a novel overlap-weighted meta-learner for estimating heterogeneous treatment effects (HTEs)<n>Our WO-learner has the favorable property of Neyman-orthogonality, meaning that it is robust against misspecification in the nuisance functions.<n>We show that our WO-learner is fully model-agnostic and can be applied to any machine learning model.
arXiv Detail & Related papers (2025-10-22T14:47:57Z) - G-computation for increasing performances of clinical trials with individual randomization and binary response [0.43541492802373877]
In a clinical trial, the random allocation aims to balance prognostic factors between arms, preventing true confounders.
Adjusting on prognostic factors is therefore recommended, especially because the related increase of the power.
In this paper, we hypothesized that G-computation associated with machine learning could be a suitable method for randomized clinical trials even with small sample sizes.
arXiv Detail & Related papers (2024-11-15T10:18:38Z) - Using Pre-training and Interaction Modeling for ancestry-specific disease prediction in UK Biobank [69.90493129893112]
Recent genome-wide association studies (GWAS) have uncovered the genetic basis of complex traits, but show an under-representation of non-European descent individuals.
Here, we assess whether we can improve disease prediction across diverse ancestries using multiomic data.
arXiv Detail & Related papers (2024-04-26T16:39:50Z) - How does promoting the minority fraction affect generalization? A theoretical study of the one-hidden-layer neural network on group imbalance [64.1656365676171]
Group imbalance has been a known problem in empirical risk minimization.
This paper quantifies the impact of individual groups on the sample complexity, the convergence rate, and the average and group-level testing performance.
arXiv Detail & Related papers (2024-03-12T04:38:05Z) - FERI: A Multitask-based Fairness Achieving Algorithm with Applications to Fair Organ Transplantation [15.481475313958219]
We introduce Fairness through the Equitable Rate of Improvement in Multitask Learning (FERI) algorithm for fair predictions of graft failure risk in liver transplant patients.
FERI constrains subgroup loss by balancing learning rates and preventing subgroup dominance in the training process.
arXiv Detail & Related papers (2023-10-20T21:14:07Z) - Segmentation-Consistent Probabilistic Lesion Counting [3.6513059119482145]
Lesion counts are important indicators of disease severity, patient prognosis, and treatment efficacy, yet counting as a task in medical imaging is often overlooked in favor of segmentation.
This work introduces a novel continuously differentiable function that maps lesion segmentation predictions to lesion count probability distributions in a consistent manner.
arXiv Detail & Related papers (2022-04-11T17:26:49Z) - Two-Stage TMLE to Reduce Bias and Improve Efficiency in Cluster
Randomized Trials [0.0]
Cluster randomized trials (CRTs) randomly assign an intervention to groups of individuals, and measure outcomes on individuals in those groups.
Findings are often missing for some individuals within clusters.
CRTs often randomize limited numbers of clusters, resulting in chance imbalances on baseline outcome predictors between arms.
arXiv Detail & Related papers (2021-06-29T21:47:30Z) - Examining and Combating Spurious Features under Distribution Shift [94.31956965507085]
We define and analyze robust and spurious representations using the information-theoretic concept of minimal sufficient statistics.
We prove that even when there is only bias of the input distribution, models can still pick up spurious features from their training data.
Inspired by our analysis, we demonstrate that group DRO can fail when groups do not directly account for various spurious correlations.
arXiv Detail & Related papers (2021-06-14T05:39:09Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.