Explainable Melanoma Diagnosis with Contrastive Learning and LLM-based Report Generation
- URL: http://arxiv.org/abs/2512.06105v1
- Date: Fri, 05 Dec 2025 19:19:36 GMT
- Title: Explainable Melanoma Diagnosis with Contrastive Learning and LLM-based Report Generation
- Authors: Junwen Zheng, Xinran Xu, Li Rong Wang, Chang Cai, Lucinda Siyun Tan, Dingyuan Wang, Hong Liang Tey, Xiuyi Fan,
- Abstract summary: Cross-modal Explainable Framework for Melanoma (CEFM)<n>Uses contrastive learning as the core mechanism for achieving interpretability.<n>Experiments on public datasets demonstrate 92.79% accuracy and an AUC of 0.961.
- Score: 6.12668702512286
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning has demonstrated expert-level performance in melanoma classification, positioning it as a powerful tool in clinical dermatology. However, model opacity and the lack of interpretability remain critical barriers to clinical adoption, as clinicians often struggle to trust the decision-making processes of black-box models. To address this gap, we present a Cross-modal Explainable Framework for Melanoma (CEFM) that leverages contrastive learning as the core mechanism for achieving interpretability. Specifically, CEFM maps clinical criteria for melanoma diagnosis-namely Asymmetry, Border, and Color (ABC)-into the Vision Transformer embedding space using dual projection heads, thereby aligning clinical semantics with visual features. The aligned representations are subsequently translated into structured textual explanations via natural language generation, creating a transparent link between raw image data and clinical interpretation. Experiments on public datasets demonstrate 92.79% accuracy and an AUC of 0.961, along with significant improvements across multiple interpretability metrics. Qualitative analyses further show that the spatial arrangement of the learned embeddings aligns with clinicians' application of the ABC rule, effectively bridging the gap between high-performance classification and clinical trust.
Related papers
- Multi-View Stenosis Classification Leveraging Transformer-Based Multiple-Instance Learning Using Real-World Clinical Data [76.89269238957593]
Coronary artery stenosis is a leading cause of cardiovascular disease, diagnosed by analyzing the coronary arteries from multiple angiography views.<n>We propose SegmentMIL, a transformer-based multi-view multiple-instance learning framework for patient-level stenosis classification.
arXiv Detail & Related papers (2026-02-02T13:07:52Z) - MelanomaNet: Explainable Deep Learning for Skin Lesion Classification [0.0]
We present MelanomaNet, an explainable deep learning system for skin lesion classification.<n>We evaluate our system on the ISIC 2019 dataset containing 25,331 dermoscopic images.<n>Our results demonstrate that high classification can be achieved alongside comprehensive interpretability.
arXiv Detail & Related papers (2025-12-10T03:22:44Z) - Self-Supervised Anatomical Consistency Learning for Vision-Grounded Medical Report Generation [61.350584471060756]
Vision-grounded medical report generation aims to produce clinically accurate descriptions of medical images.<n>We propose Self-Supervised Anatomical Consistency Learning (SS-ACL) to align generated reports with corresponding anatomical regions.<n>SS-ACL constructs a hierarchical anatomical graph inspired by the invariant top-down inclusion structure of human anatomy.
arXiv Detail & Related papers (2025-09-30T08:59:06Z) - Latent Space Analysis for Melanoma Prevention [5.441932327359051]
Melanoma represents a critical health risk, underscoring the need for early, interpretable diagnostic tools.<n>This work introduces a novel approach that extends beyond classification, enabling interpretable risk modelling.<n>The proposed method learns a structured latent space that captures semantic relationships among lesions, allowing for a nuanced, continuous assessment of morphological differences.
arXiv Detail & Related papers (2025-06-23T08:49:57Z) - MedGrad E-CLIP: Enhancing Trust and Transparency in AI-Driven Skin Lesion Diagnosis [2.9540164442363976]
This study leverages the CLIP (Contrastive Language-Image Pretraining) model, trained on different skin lesion datasets, to capture meaningful relationships between visual features and diagnostic criteria terms.<n>We propose a method called MedGrad E-CLIP, which builds on gradient-based E-CLIP by incorporating a weighted entropy mechanism designed for complex medical imaging like skin lesions.<n>By visually explaining how different features in an image relates to diagnostic criteria, this approach demonstrates the potential of advanced vision-language models in medical image analysis.
arXiv Detail & Related papers (2025-01-12T17:50:47Z) - Aligning Human Knowledge with Visual Concepts Towards Explainable Medical Image Classification [8.382606243533942]
We introduce a simple yet effective framework, Explicd, towards Explainable language-informed criteria-based diagnosis.
By leveraging a pretrained vision-language model, Explicd injects these criteria into the embedding space as knowledge anchors.
The final diagnostic outcome is determined based on the similarity scores between the encoded visual concepts and the textual criteria embeddings.
arXiv Detail & Related papers (2024-06-08T23:23:28Z) - ContrastDiagnosis: Enhancing Interpretability in Lung Nodule Diagnosis
Using Contrastive Learning [23.541034347602935]
Clinicians' distrust of black box models has hindered the clinical deployment of AI products.
We propose ContrastDiagnosis, a straightforward yet effective interpretable diagnosis framework.
High diagnostic accuracy was achieved with AUC of 0.977 while maintain a high transparency and explainability.
arXiv Detail & Related papers (2024-03-08T13:00:52Z) - A Transformer-based representation-learning model with unified
processing of multimodal input for clinical diagnostics [63.106382317917344]
We report a Transformer-based representation-learning model as a clinical diagnostic aid that processes multimodal input in a unified manner.
The unified model outperformed an image-only model and non-unified multimodal diagnosis models in the identification of pulmonary diseases.
arXiv Detail & Related papers (2023-06-01T16:23:47Z) - Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation [116.87918100031153]
We propose a Cross-modal clinical Graph Transformer (CGT) for ophthalmic report generation (ORG)
CGT injects clinical relation triples into the visual features as prior knowledge to drive the decoding procedure.
Experiments on the large-scale FFA-IR benchmark demonstrate that the proposed CGT is able to outperform previous benchmark methods.
arXiv Detail & Related papers (2022-06-04T13:16:30Z) - BI-RADS-Net: An Explainable Multitask Learning Approach for Cancer
Diagnosis in Breast Ultrasound Images [69.41441138140895]
This paper introduces BI-RADS-Net, a novel explainable deep learning approach for cancer detection in breast ultrasound images.
The proposed approach incorporates tasks for explaining and classifying breast tumors, by learning feature representations relevant to clinical diagnosis.
Explanations of the predictions (benign or malignant) are provided in terms of morphological features that are used by clinicians for diagnosis and reporting in medical practice.
arXiv Detail & Related papers (2021-10-05T19:14:46Z) - VBridge: Connecting the Dots Between Features, Explanations, and Data
for Healthcare Models [85.4333256782337]
VBridge is a visual analytics tool that seamlessly incorporates machine learning explanations into clinicians' decision-making workflow.
We identified three key challenges, including clinicians' unfamiliarity with ML features, lack of contextual information, and the need for cohort-level evidence.
We demonstrated the effectiveness of VBridge through two case studies and expert interviews with four clinicians.
arXiv Detail & Related papers (2021-08-04T17:34:13Z) - Malignancy Prediction and Lesion Identification from Clinical
Dermatological Images [65.1629311281062]
We consider machine-learning-based malignancy prediction and lesion identification from clinical dermatological images.
We first identify all lesions present in the image regardless of sub-type or likelihood of malignancy, then it estimates their likelihood of malignancy, and through aggregation, it also generates an image-level likelihood of malignancy.
arXiv Detail & Related papers (2021-04-02T20:52:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.