Comparative Evaluation of Deep Learning-Based and WHO-Informed Approaches for Sperm Morphology Assessment
- URL: http://arxiv.org/abs/2601.10070v1
- Date: Thu, 15 Jan 2026 04:51:36 GMT
- Title: Comparative Evaluation of Deep Learning-Based and WHO-Informed Approaches for Sperm Morphology Assessment
- Authors: Mohammad Abbadi,
- Abstract summary: Assessment of sperm morphological quality remains a critical yet subjective component of male fertility evaluation.<n>This study presents a comparative biomedical artificial intelligence framework evaluating an image-based deep learning model (HuSHeM)<n>HuSHeM model was trained on high-resolution sperm morphology images and evaluated using an independent clinical cohort.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Assessment of sperm morphological quality remains a critical yet subjective component of male fertility evaluation, often limited by inter-observer variability and resource constraints. This study presents a comparative biomedical artificial intelligence framework evaluating an image-based deep learning model (HuSHeM) alongside a clinically grounded baseline derived from World Health Organization criteria augmented with the Systemic Inflammation Response Index (WHO(+SIRI)). The HuSHeM model was trained on high-resolution sperm morphology images and evaluated using an independent clinical cohort. Model performance was assessed using discrimination, calibration, and clinical utility analyses. The HuSHeM model demonstrated higher discriminative performance, as reflected by an increased area under the receiver operating characteristic curve with relatively narrow confidence intervals compared to WHO(+SIRI). Precision-recall analysis further indicated improved performance under class imbalance, with higher precision-recall area values across evaluated thresholds. Calibration analysis indicated closer agreement between predicted probabilities and observed outcomes for HuSHeM, while decision curve analysis suggested greater net clinical benefit across clinically relevant threshold probabilities. These findings suggest that image-based deep learning may offer improved predictive reliability and clinical utility compared with traditional rule-based and inflammation-augmented criteria. The proposed framework supports objective and reproducible assessment of sperm morphology and may serve as a decision-support tool within fertility screening and referral workflows. The proposed models are intended as decision-support or referral tools and are not designed to replace clinical judgment or laboratory assessment.
Related papers
- Investigating the Impact of Histopathological Foundation Models on Regressive Prediction of Homologous Recombination Deficiency [52.50039435394964]
We systematically evaluate foundation models for regression-based tasks.<n>We extract patch-level features from whole slide images (WSI) using five state-of-the-art foundation models.<n>Models are trained to predict continuous HRD scores based on these extracted features across breast, endometrial, and lung cancer cohorts.
arXiv Detail & Related papers (2026-01-29T14:06:50Z) - AgentsEval: Clinically Faithful Evaluation of Medical Imaging Reports via Multi-Agent Reasoning [73.50200033931148]
We introduce AgentsEval, a multi-agent stream reasoning framework that emulates the collaborative diagnostic workflow of radiologists.<n>By dividing the evaluation process into interpretable steps including criteria definition, evidence extraction, alignment, and consistency scoring, AgentsEval provides explicit reasoning traces and structured clinical feedback.<n> Experimental results demonstrate that AgentsEval delivers clinically aligned, semantically faithful, and interpretable evaluations that remain robust under paraphrastic, semantic, and stylistic perturbations.
arXiv Detail & Related papers (2026-01-23T11:59:13Z) - A systematic evaluation of uncertainty quantification techniques in deep learning: a case study in photoplethysmography signal analysis [1.6690512882610855]
Deep learning models can be used to continuously monitor physiological parameters outside of clinical settings.<n>There is risk of poor performance when deployed in practical measurement scenarios leading to negative patient outcomes.<n>Here we implement eight uncertainty (UQ) techniques to models trained on two clinically relevant prediction tasks.
arXiv Detail & Related papers (2025-10-31T22:54:13Z) - Secure Diagnostics: Adversarial Robustness Meets Clinical Interpretability [9.522045116604358]
Deep neural networks for medical image classification often fail to generalize consistently in clinical practice.<n>This paper examines interpretability in deep neural networks fine-tuned for fracture detection by evaluating model performance against adversarial attack.
arXiv Detail & Related papers (2025-04-07T20:26:02Z) - Methodological Explainability Evaluation of an Interpretable Deep Learning Model for Post-Hepatectomy Liver Failure Prediction Incorporating Counterfactual Explanations and Layerwise Relevance Propagation: A Prospective In Silico Trial [13.171582596404313]
We developed a variational autoencoder-multilayer perceptron (VAE-MLP) model for preoperative PHLF prediction.
This model integrated counterfactuals and layerwise relevance propagation (LRP) to provide insights into its decision-making mechanism.
Results from the three-track in silico clinical trial showed that clinicians' prediction accuracy and confidence increased when AI explanations were provided.
arXiv Detail & Related papers (2024-08-07T13:47:32Z) - Uncertainty Quantification on Clinical Trial Outcome Prediction [37.25114005044208]
We propose incorporating uncertainty quantification into clinical trial outcome predictions.<n>Our main goal is to enhance the model's ability to discern nuanced differences.<n>We have adopted a selective classification approach to fulfill our objective.
arXiv Detail & Related papers (2024-01-07T13:48:05Z) - Improving Robustness and Reliability in Medical Image Classification with Latent-Guided Diffusion and Nested-Ensembles [4.249986624493547]
Once deployed, medical image analysis methods are often faced with unexpected image corruptions and noise perturbations.<n>LaDiNE is a novel ensemble learning method combining the robustness of Vision Transformers with diffusion-based generative models.<n>Experiments on tuberculosis chest X-rays and melanoma skin cancer datasets demonstrate that LaDiNE achieves superior performance compared to a wide range of state-of-the-art methods.
arXiv Detail & Related papers (2023-10-24T15:53:07Z) - TREEMENT: Interpretable Patient-Trial Matching via Personalized Dynamic
Tree-Based Memory Network [54.332862955411656]
Clinical trials are critical for drug development but often suffer from expensive and inefficient patient recruitment.
In recent years, machine learning models have been proposed for speeding up patient recruitment via automatically matching patients with clinical trials.
We introduce a dynamic tree-based memory network model named TREEMENT to provide accurate and interpretable patient trial matching.
arXiv Detail & Related papers (2023-07-19T12:35:09Z) - Towards Reliable Medical Image Segmentation by Modeling Evidential Calibrated Uncertainty [57.023423137202485]
Concerns regarding the reliability of medical image segmentation persist among clinicians.<n>We introduce DEviS, an easily implementable foundational model that seamlessly integrates into various medical image segmentation networks.<n>By leveraging subjective logic theory, we explicitly model probability and uncertainty for medical image segmentation.
arXiv Detail & Related papers (2023-01-01T05:02:46Z) - Causal Inference via Nonlinear Variable Decorrelation for Healthcare
Applications [60.26261850082012]
We introduce a novel method with a variable decorrelation regularizer to handle both linear and nonlinear confounding.
We employ association rules as new representations using association rule mining based on the original features to increase model interpretability.
arXiv Detail & Related papers (2022-09-29T17:44:14Z) - Benchmarking Heterogeneous Treatment Effect Models through the Lens of
Interpretability [82.29775890542967]
Estimating personalized effects of treatments is a complex, yet pervasive problem.
Recent developments in the machine learning literature on heterogeneous treatment effect estimation gave rise to many sophisticated, but opaque, tools.
We use post-hoc feature importance methods to identify features that influence the model's predictions.
arXiv Detail & Related papers (2022-06-16T17:59:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.