MeCaMIL: Causality-Aware Multiple Instance Learning for Fair and Interpretable Whole Slide Image Diagnosis
- URL: http://arxiv.org/abs/2511.11004v1
- Date: Fri, 14 Nov 2025 06:47:21 GMT
- Title: MeCaMIL: Causality-Aware Multiple Instance Learning for Fair and Interpretable Whole Slide Image Diagnosis
- Authors: Yiran Song, Yikai Zhang, Shuang Zhou, Guojun Xiong, Xiaofeng Yang, Nian Wang, Fenglong Ma, Rui Zhang, Mingquan Lin,
- Abstract summary: Multiple instance learning (MIL) has emerged as the dominant paradigm for whole slide image (WSI) analysis in computational pathology.<n>textbfMeCaMIL, a causality-aware MIL framework, explicitly models demographic confounders through structured causal graphs.<n>MeCaMIL achieves superior fairness -- demographic disparity variance drops by over 65% relative reduction on average across attributes.
- Score: 40.3028468133626
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Multiple instance learning (MIL) has emerged as the dominant paradigm for whole slide image (WSI) analysis in computational pathology, achieving strong diagnostic performance through patch-level feature aggregation. However, existing MIL methods face critical limitations: (1) they rely on attention mechanisms that lack causal interpretability, and (2) they fail to integrate patient demographics (age, gender, race), leading to fairness concerns across diverse populations. These shortcomings hinder clinical translation, where algorithmic bias can exacerbate health disparities. We introduce \textbf{MeCaMIL}, a causality-aware MIL framework that explicitly models demographic confounders through structured causal graphs. Unlike prior approaches treating demographics as auxiliary features, MeCaMIL employs principled causal inference -- leveraging do-calculus and collider structures -- to disentangle disease-relevant signals from spurious demographic correlations. Extensive evaluation on three benchmarks demonstrates state-of-the-art performance across CAMELYON16 (ACC/AUC/F1: 0.939/0.983/0.946), TCGA-Lung (0.935/0.979/0.931), and TCGA-Multi (0.977/0.993/0.970, five cancer types). Critically, MeCaMIL achieves superior fairness -- demographic disparity variance drops by over 65% relative reduction on average across attributes, with notable improvements for underserved populations. The framework generalizes to survival prediction (mean C-index: 0.653, +0.017 over best baseline across five cancer types). Ablation studies confirm causal graph structure is essential -- alternative designs yield 0.048 lower accuracy and 4.2x times worse fairness. These results establish MeCaMIL as a principled framework for fair, interpretable, and clinically actionable AI in digital pathology. Code will be released upon acceptance.
Related papers
- ClinNet: Evidential Ordinal Regression with Bilateral Asymmetry and Prototype Memory for Knee Osteoarthritis Grading [3.337151338735509]
Knee osteoarthritis (KOA) grading based on radiographic images is a critical yet challenging task.<n>In this work, we propose ClinNet, a novel trustworthy framework that addresses KOA grading as evidential ordinal regression problem.
arXiv Detail & Related papers (2026-01-24T05:49:41Z) - Case Prompting to Mitigate Large Language Model Bias for ICU Mortality Prediction [17.91443453604627]
Large language models (LLMs) show promise in predicting outcomes from structured medical data.<n>LLMs may exhibit demographic biases related to sex, age, and race, limiting their trustworthy use in clinical practice.<n>We propose a training-free, clinically adaptive prompting framework to simultaneously improve fairness and performance.
arXiv Detail & Related papers (2025-12-17T12:29:53Z) - Intersectional Fairness in Vision-Language Models for Medical Image Disease Classification [25.30858592524878]
Cross-Modal Alignment Consistency (CMAC-MMD) is a training framework that standardises diagnostic certainty across intersectional patient subgroups.<n>In the dermatology cohort, the proposed method reduced the overall intersectional missed diagnosis gap (difference in True Positive Rate, $$TPR) from 0.50 to 0.26.<n>For glaucoma screening, the method reduced $$TPR from 0.41 to 0.31, achieving a better AUC of 0.72 (vs. 0.71 baseline)
arXiv Detail & Related papers (2025-12-17T09:47:29Z) - Contrastive Integrated Gradients: A Feature Attribution-Based Method for Explaining Whole Slide Image Classification [16.637098984977055]
Interpretability is essential in Whole Slide Image (WSI) analysis for computational pathology.<n>We introduce Contrastive Integrated Gradients (CIG), a novel attribution method that enhances interpretability by computing contrastive gradients in logit space.
arXiv Detail & Related papers (2025-11-11T17:07:32Z) - Bridging Accuracy and Interpretability: Deep Learning with XAI for Breast Cancer Detection [0.0]
We present an interpretable deep learning framework for the early detection of breast cancer using quantitative features extracted from digitized fine needle aspirate (FNA) images of breast masses.<n>Our deep neural network, using ReLU activations, the Adam visualizations, and a binary cross-entropy loss, delivers state-of-the-art classification performance.
arXiv Detail & Related papers (2025-10-18T07:47:26Z) - Comparative assessment of fairness definitions and bias mitigation strategies in machine learning-based diagnosis of Alzheimer's disease from MR images [4.569587135821805]
The present study performs a fairness analysis of machine learning (ML) models for the diagnosis of Mild Cognitive Impairment (MCI) and Alzheimer's disease (AD) from MRI-derived neuroimaging features.<n> Biases associated with age, race, and gender in a multi-cohort dataset are investigated.<n>Results reveal the existence of biases related to age and race, while no significant gender bias is observed.
arXiv Detail & Related papers (2025-05-29T15:07:19Z) - CRTRE: Causal Rule Generation with Target Trial Emulation Framework [47.2836994469923]
We introduce a novel method called causal rule generation with target trial emulation framework (CRTRE)
CRTRE applies randomize trial design principles to estimate the causal effect of association rules.
We then incorporate such association rules for the downstream applications such as prediction of disease onsets.
arXiv Detail & Related papers (2024-11-10T02:40:06Z) - Establishing Causal Relationship Between Whole Slide Image Predictions and Diagnostic Evidence Subregions in Deep Learning [3.5504159526793924]
Causal Inference Multiple Instance Learning (CI-MIL) uses out-of-distribution generalization to reduce the recognition confusion of sub-images.<n>CI-MIL exhibits superior interpretability, as its selected regions demonstrate high consistency with ground truth annotations.
arXiv Detail & Related papers (2024-07-24T11:00:08Z) - Using Pre-training and Interaction Modeling for ancestry-specific disease prediction in UK Biobank [69.90493129893112]
Recent genome-wide association studies (GWAS) have uncovered the genetic basis of complex traits, but show an under-representation of non-European descent individuals.
Here, we assess whether we can improve disease prediction across diverse ancestries using multiomic data.
arXiv Detail & Related papers (2024-04-26T16:39:50Z) - CIMIL-CRC: a clinically-informed multiple instance learning framework for patient-level colorectal cancer molecular subtypes classification from H\&E stained images [42.771819949806655]
We introduce CIMIL-CRC', a framework that solves the MSI/MSS MIL problem by efficiently combining a pre-trained feature extraction model with principal component analysis (PCA) to aggregate information from all patches.
We assessed our CIMIL-CRC method using the average area under the curve (AUC) from a 5-fold cross-validation experimental setup for model development on the TCGA-CRC-DX cohort.
arXiv Detail & Related papers (2024-01-29T12:56:11Z) - IA-GCN: Interpretable Attention based Graph Convolutional Network for
Disease prediction [47.999621481852266]
We propose an interpretable graph learning-based model which interprets the clinical relevance of the input features towards the task.
In a clinical scenario, such a model can assist the clinical experts in better decision-making for diagnosis and treatment planning.
Our proposed model shows superior performance with respect to compared methods with an increase in an average accuracy of 3.2% for Tadpole, 1.6% for UKBB Gender, and 2% for the UKBB Age prediction task.
arXiv Detail & Related papers (2021-03-29T13:04:02Z) - Dynamic Graph Correlation Learning for Disease Diagnosis with Incomplete
Labels [66.57101219176275]
Disease diagnosis on chest X-ray images is a challenging multi-label classification task.
We propose a Disease Diagnosis Graph Convolutional Network (DD-GCN) that presents a novel view of investigating the inter-dependency among different diseases.
Our method is the first to build a graph over the feature maps with a dynamic adjacency matrix for correlation learning.
arXiv Detail & Related papers (2020-02-26T17:10:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.