Related papers: Beyond Diagnostic Performance: Revealing and Quantifying Ethical Risks in Pathology Foundation Models

Beyond Diagnostic Performance: Revealing and Quantifying Ethical Risks in Pathology Foundation Models

URL: http://arxiv.org/abs/2502.16889v2
Date: Tue, 01 Jul 2025 15:08:41 GMT
Title: Beyond Diagnostic Performance: Revealing and Quantifying Ethical Risks in Pathology Foundation Models
Authors: Weiping Lin, Shen Liu, Runchen Zhu, Yixuan Lin, Baoshun Wang, Liansheng Wang,
Abstract summary: Pathology foundation models (PFMs) are large-scale pre-trained models tailored for computational pathology.<n>We pioneer the quantitative analysis for ethical risks in PFMs, including privacy leakage, clinical reliability, and group fairness.<n>This work provides the first quantitative and systematic evaluation of ethical risks in PFMs.
Score: 9.324455712108175
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Pathology foundation models (PFMs), as large-scale pre-trained models tailored for computational pathology, have significantly advanced a wide range of applications. Their ability to leverage prior knowledge from massive datasets has streamlined the development of intelligent pathology models. However, we identify several critical and interrelated ethical risks that remain underexplored, yet must be addressed to enable the safe translation of PFMs from lab to clinic. These include the potential leakage of patient-sensitive attributes, disparities in model performance across demographic and institutional subgroups, and the reliance on diagnosis-irrelevant features that undermine clinical reliability. In this study, we pioneer the quantitative analysis for ethical risks in PFMs, including privacy leakage, clinical reliability, and group fairness. Specifically, we propose an evaluation framework that systematically measures key dimensions of ethical concern: the degree to which patient-sensitive attributes can be inferred from model representations, the extent of performance disparities across demographic and institutional subgroups, and the influence of diagnostically irrelevant features on model decisions. We further investigate the underlying causes of these ethical risks in PFMs and empirically validate our findings. Then we offer insights into potential directions for mitigating such risks, aiming to inform the development of more ethically robust PFMs. This work provides the first quantitative and systematic evaluation of ethical risks in PFMs. Our findings highlight the urgent need for ethical safeguards in PFMs and offer actionable insights for building more trustworthy and clinically robust PFMs. To facilitate future research and deployment, we will release the assessment framework as an online toolkit to support the development, auditing, and deployment of ethically robust PFMs.

Related papers

Machine Learning Solutions Integrated in an IoT Healthcare Platform for Heart Failure Risk Stratification [0.16863755729554883]
The management of chronic Heart Failure (HF) presents significant challenges in modern healthcare.<n>We present a predictive model founded on Machine Learning (ML) techniques to identify patients at HF risk.
arXiv Detail & Related papers (2025-04-07T14:07:05Z)
A Survey of Pathology Foundation Model: Progress and Future Directions [3.009351592961681]
Recent Pathology Foundation Models (PFMs), pretrained on large-scale histopathology data, have significantly enhanced capabilities of extractors and aggregators. This survey presents a hierarchical taxonomy organizing PFMs through a top-down philosophy that can be utilized to analyze FMs in any domain.
arXiv Detail & Related papers (2025-04-05T03:44:09Z)
Conformal uncertainty quantification to evaluate predictive fairness of foundation AI model for skin lesion classes across patient demographics [8.692647930497936]
We use conformal analysis to quantify the predictive uncertainty of a vision transformer based foundation model.<n>We show how this can be used as a fairness metric to evaluate the robustness of the feature embeddings of the foundation model.
arXiv Detail & Related papers (2025-03-31T08:06:00Z)
Perplexity Trap: PLM-Based Retrievers Overrate Low Perplexity Documents [64.43980129731587]
We propose a causal-inspired inference-time debiasing method called Causal Diagnosis and Correction (CDC) CDC first diagnoses the bias effect of the perplexity and then separates the bias effect from the overall relevance score. Experimental results across three domains demonstrate the superior debiasing effectiveness.
arXiv Detail & Related papers (2025-03-11T17:59:00Z)
Fragility-aware Classification for Understanding Risk and Improving Generalization [6.926253982569273]
We introduce the Fragility Index (FI), a novel metric that evaluates classification performance from a risk-averse perspective.<n>We derive exact reformulations for cross-entropy loss, hinge-type loss, and Lipschitz loss, and extend the approach to deep learning models.
arXiv Detail & Related papers (2025-02-18T16:44:03Z)
Fair Diagnosis: Leveraging Causal Modeling to Mitigate Medical Bias [14.848344916632024]
In medical image analysis, model predictions can be affected by sensitive attributes, such as race and gender.<n>We present a causal modeling framework, which aims to reduce the impact of sensitive attributes on diagnostic predictions.
arXiv Detail & Related papers (2024-12-06T02:59:36Z)
FOCUS: Knowledge-enhanced Adaptive Visual Compression for Few-shot Whole Slide Image Classification [4.148491257542209]
Few-shot learning presents a critical solution for cancer diagnosis in computational pathology. A key challenge in this paradigm stems from the inherent disparity between the limited training set of whole slide images (WSIs) and the enormous number of contained patches. We introduce the knowledge-enhanced adaptive visual compression framework, dubbed FOCUS, to enable a focused analysis of diagnostically relevant regions.
arXiv Detail & Related papers (2024-11-22T05:36:38Z)
Establishing Causal Relationship Between Whole Slide Image Predictions and Diagnostic Evidence Subregions in Deep Learning [3.5504159526793924]
Causal Inference Multiple Instance Learning (CI-MIL) uses out-of-distribution generalization to reduce the recognition confusion of sub-images. CI-MIL exhibits superior interpretability, as its selected regions demonstrate high consistency with ground truth annotations.
arXiv Detail & Related papers (2024-07-24T11:00:08Z)
Decoding Decision Reasoning: A Counterfactual-Powered Model for Knowledge Discovery [6.1521675665532545]
In medical imaging, discerning the rationale behind an AI model's predictions is crucial for evaluating its reliability. We propose an explainable model that is equipped with both decision reasoning and feature identification capabilities. By implementing our method, we can efficiently identify and visualise class-specific features leveraged by the data-driven model.
arXiv Detail & Related papers (2024-05-23T19:00:38Z)
Unified Uncertainty Estimation for Cognitive Diagnosis Models [70.46998436898205]
We propose a unified uncertainty estimation approach for a wide range of cognitive diagnosis models. We decompose the uncertainty of diagnostic parameters into data aspect and model aspect. Our method is effective and can provide useful insights into the uncertainty of cognitive diagnosis.
arXiv Detail & Related papers (2024-03-09T13:48:20Z)
Optimizing Skin Lesion Classification via Multimodal Data and Auxiliary Task Integration [54.76511683427566]
This research introduces a novel multimodal method for classifying skin lesions, integrating smartphone-captured images with essential clinical and demographic information. A distinctive aspect of this method is the integration of an auxiliary task focused on super-resolution image prediction. The experimental evaluations have been conducted using the PAD-UFES20 dataset, applying various deep-learning architectures.
arXiv Detail & Related papers (2024-02-16T05:16:20Z)
Progress and Opportunities of Foundation Models in Bioinformatics [77.74411726471439]
Foundations models (FMs) have ushered in a new era in computational biology, especially in the realm of deep learning. Central to our focus is the application of FMs to specific biological problems, aiming to guide the research community in choosing appropriate FMs for their research needs. Review analyses challenges and limitations faced by FMs in biology, such as data noise, model explainability, and potential biases.
arXiv Detail & Related papers (2024-02-06T02:29:17Z)
MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion. It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space. It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z)
Explainable AI for clinical risk prediction: a survey of concepts, methods, and modalities [2.9404725327650767]
Review of progress in developing explainable models for clinical risk prediction. emphasizes the need for external validation and the combination of diverse interpretability methods. End-to-end approach to explainability in clinical risk prediction is essential for success.
arXiv Detail & Related papers (2023-08-16T14:51:51Z)
Explainable AI for Malnutrition Risk Prediction from m-Health and Clinical Data [3.093890460224435]
This paper presents a novel AI framework for early and explainable malnutrition risk detection based on heterogeneous m-health data. We performed an extensive model evaluation including both subject-independent and personalised predictions. We also investigated several benchmark XAI methods to extract global model explanations.
arXiv Detail & Related papers (2023-05-31T08:07:35Z)
Informing clinical assessment by contextualizing post-hoc explanations of risk prediction models in type-2 diabetes [50.8044927215346]
We consider a comorbidity risk prediction scenario and focus on contexts regarding the patients clinical state. We employ several state-of-the-art LLMs to present contexts around risk prediction model inferences and evaluate their acceptability. Our paper is one of the first end-to-end analyses identifying the feasibility and benefits of contextual explanations in a real-world clinical use case.
arXiv Detail & Related papers (2023-02-11T18:07:11Z)
Data-Centric Epidemic Forecasting: A Survey [56.99209141838794]
This survey delves into various data-driven methodological and practical advancements. We enumerate the large number of epidemiological datasets and novel data streams that are relevant to epidemic forecasting. We also discuss experiences and challenges that arise in real-world deployment of these forecasting systems.
arXiv Detail & Related papers (2022-07-19T16:15:11Z)
Benchmarking Heterogeneous Treatment Effect Models through the Lens of Interpretability [82.29775890542967]
Estimating personalized effects of treatments is a complex, yet pervasive problem. Recent developments in the machine learning literature on heterogeneous treatment effect estimation gave rise to many sophisticated, but opaque, tools. We use post-hoc feature importance methods to identify features that influence the model's predictions.
arXiv Detail & Related papers (2022-06-16T17:59:05Z)
Domain Invariant Model with Graph Convolutional Network for Mammogram Classification [49.691629817104925]
We propose a novel framework, namely Domain Invariant Model with Graph Convolutional Network (DIM-GCN) We first propose a Bayesian network, which explicitly decomposes the latent variables into disease-related and other disease-irrelevant parts that are provable to be disentangled from each other. To better capture the macroscopic features, we leverage the observed clinical attributes as a goal for reconstruction, via Graph Convolutional Network (GCN)
arXiv Detail & Related papers (2022-04-21T08:23:44Z)
What Do You See in this Patient? Behavioral Testing of Clinical NLP Models [69.09570726777817]
We introduce an extendable testing framework that evaluates the behavior of clinical outcome models regarding changes of the input. We show that model behavior varies drastically even when fine-tuned on the same data and that allegedly best-performing models have not always learned the most medically plausible patterns.
arXiv Detail & Related papers (2021-11-30T15:52:04Z)
Clinical Outcome Prediction from Admission Notes using Self-Supervised Knowledge Integration [55.88616573143478]
Outcome prediction from clinical text can prevent doctors from overlooking possible risks. Diagnoses at discharge, procedures performed, in-hospital mortality and length-of-stay prediction are four common outcome prediction targets. We propose clinical outcome pre-training to integrate knowledge about patient outcomes from multiple public sources.
arXiv Detail & Related papers (2021-02-08T10:26:44Z)
UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model. UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data. We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD) UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.