An Explainable and Fair AI Tool for PCOS Risk Assessment: Calibration, Subgroup Equity, and Interactive Clinical Deployment
- URL: http://arxiv.org/abs/2511.11636v1
- Date: Sat, 08 Nov 2025 16:14:56 GMT
- Title: An Explainable and Fair AI Tool for PCOS Risk Assessment: Calibration, Subgroup Equity, and Interactive Clinical Deployment
- Authors: Asma Sadia Khan, Sadia Tabassum,
- Abstract summary: This paper presents a fairness-audited and interpretable machine learning framework for predicting polycystic ovary syndrome (PCOS)<n>The framework integrated SHAP-based feature attributions with demographic audits to connect predictive explanations with observed disparities for actionable insights.<n>A Streamlit-based web interface enables real-time PCOS risk assessment, Rotterdam criteria evaluation, and interactive 'what-if' analysis.
- Score: 0.10026496861838446
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a fairness-audited and interpretable machine learning framework for predicting polycystic ovary syndrome (PCOS), designed to evaluate model performance and identify diagnostic disparities across patient subgroups. The framework integrated SHAP-based feature attributions with demographic audits to connect predictive explanations with observed disparities for actionable insights. Probabilistic calibration metrics (Brier Score and Expected Calibration Error) are incorporated to ensure reliable risk predictions across subgroups. Random Forest, SVM, and XGBoost models were trained with isotonic and Platt scaling for calibration and fairness comparison. A calibrated Random Forest achieved a high predictive accuracy of 90.8%. SHAP analysis identified follicle count, weight gain, and menstrual irregularity as the most influential features, which are consistent with the Rotterdam diagnostic criteria. Although the SVM with isotonic calibration achieved the lowest calibration error (ECE = 0.0541), the Random Forest model provided a better balance between calibration and interpretability (Brier = 0.0678, ECE = 0.0666). Therefore, it was selected for detailed fairness and SHAP analyses. Subgroup analysis revealed that the model performed best among women aged 25-35 (accuracy 90.9%) but underperformed in those under 25 (69.2%), highlighting age-related disparities. The model achieved perfect precision in obese women and maintained high recall in lean PCOS cases, demonstrating robustness across phenotypes. Finally, a Streamlit-based web interface enables real-time PCOS risk assessment, Rotterdam criteria evaluation, and interactive 'what-if' analysis, bridging the gap between AI research and clinical usability.
Related papers
- Multimodal Survival Modeling and Fairness-Aware Clinical Machine Learning for 5-Year Breast Cancer Risk Prediction [4.750682174151462]
We present a fully reproducible machine learning framework for 5-year overall survival prediction in breast cancer.<n>We integrate clinical variables with high-dimensional transcriptomic and copy-number alteration (CNA) features from the METABRIC cohort.<n>Performance was assessed using time-dependent area under the ROC curve (AUC), average precision (AP), calibration curves, Brier score, and bootstrapped 95 percent confidence intervals.
arXiv Detail & Related papers (2026-02-25T07:20:43Z) - CytoDINO: Risk-Aware and Biologically-Informed Adaptation of DINOv3 for Bone Marrow Cytomorphology [0.0]
We introduce CytoDINO, a framework that achieves state-of-the-art performance on the Munich Leukemia Laboratory dataset.<n>Our primary contribution is a novel Hierarchical Focal Loss with Critical Penalties, which encodes biological relationships between cell lineages and explicitly penalizes clinically dangerous misclassifications.<n>CytoDINO achieves an 88.2% weighted F1 score and 76.5% macro F1 on a held-out test set of 21 cell classes.
arXiv Detail & Related papers (2025-12-09T23:09:22Z) - On the Role of Calibration in Benchmarking Algorithmic Fairness for Skin Cancer Detection [0.03066137405373616]
We assess the performance of the leading skin cancer detection algorithm of the ISIC 2020 Challenge dataset.<n>We compare it with the second and third place models, focusing on subgroups defined by sex, race, and age.<n>Our findings reveal that while existing models enhance discriminative accuracy, they often over-diagnose risk and exhibit calibration issues when applied to new datasets.
arXiv Detail & Related papers (2025-11-10T23:51:04Z) - Predicting Metabolic Dysfunction-Associated Steatotic Liver Disease using Machine Learning Methods [0.8642326601683298]
We developed a fair, rigorous, and reproducible MASLD prediction model.<n>MASLD affects 33% of U.S. adults and is the most common chronic liver disease.
arXiv Detail & Related papers (2025-10-25T13:36:18Z) - Assessing the Feasibility of Early Cancer Detection Using Routine Laboratory Data: An Evaluation of Machine Learning Approaches on an Imbalanced Dataset [0.02030567625639093]
The development of accessible screening tools for early cancer detection in dogs represents a significant challenge in veterinary medicine.<n>This study assesses the feasibility of cancer risk classification using the Golden Retriever Lifetime Study cohort under real-world constraints.<n>It is concluded that while a statistically detectable cancer signal exists in routine lab data, it is too weak and confounded for clinically reliable discrimination from normal aging or other inflammatory conditions.
arXiv Detail & Related papers (2025-10-23T04:52:42Z) - Automatic Cough Analysis for Non-Small Cell Lung Cancer Detection [33.37223681850477]
Early detection of non-small cell lung cancer (NSCLC) is critical for improving patient outcomes.<n>We explore the use of automatic cough analysis as a pre-screening tool for distinguishing between NSCLC patients and healthy controls.<n>Recordings were analyzed using machine learning techniques, such as support vector machine (SVM) and XGBoost.
arXiv Detail & Related papers (2025-07-25T11:30:22Z) - Using Pre-training and Interaction Modeling for ancestry-specific disease prediction in UK Biobank [69.90493129893112]
Recent genome-wide association studies (GWAS) have uncovered the genetic basis of complex traits, but show an under-representation of non-European descent individuals.
Here, we assess whether we can improve disease prediction across diverse ancestries using multiomic data.
arXiv Detail & Related papers (2024-04-26T16:39:50Z) - Evaluating Probabilistic Classifiers: The Triptych [62.997667081978825]
We propose and study a triptych of diagnostic graphics that focus on distinct and complementary aspects of forecast performance.
The reliability diagram addresses calibration, the receiver operating characteristic (ROC) curve diagnoses discrimination ability, and the Murphy diagram visualizes overall predictive performance and value.
arXiv Detail & Related papers (2023-01-25T19:35:23Z) - Towards Reliable Medical Image Segmentation by Modeling Evidential Calibrated Uncertainty [57.023423137202485]
Concerns regarding the reliability of medical image segmentation persist among clinicians.<n>We introduce DEviS, an easily implementable foundational model that seamlessly integrates into various medical image segmentation networks.<n>By leveraging subjective logic theory, we explicitly model probability and uncertainty for medical image segmentation.
arXiv Detail & Related papers (2023-01-01T05:02:46Z) - Increasing the efficiency of randomized trial estimates via linear
adjustment for a prognostic score [59.75318183140857]
Estimating causal effects from randomized experiments is central to clinical research.
Most methods for historical borrowing achieve reductions in variance by sacrificing strict type-I error rate control.
arXiv Detail & Related papers (2020-12-17T21:10:10Z) - UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced
Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model.
UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data.
We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD)
UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z) - Hemogram Data as a Tool for Decision-making in COVID-19 Management:
Applications to Resource Scarcity Scenarios [62.997667081978825]
COVID-19 pandemics has challenged emergency response systems worldwide, with widespread reports of essential services breakdown and collapse of health care structure.
This work describes a machine learning model derived from hemogram exam data performed in symptomatic patients.
Proposed models can predict COVID-19 qRT-PCR results in symptomatic individuals with high accuracy, sensitivity and specificity.
arXiv Detail & Related papers (2020-05-10T01:45:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.