A Multimodal Vision Foundation Model for Clinical Dermatology
- URL: http://arxiv.org/abs/2410.15038v3
- Date: Sun, 13 Apr 2025 05:58:18 GMT
- Title: A Multimodal Vision Foundation Model for Clinical Dermatology
- Authors: Siyuan Yan, Zhen Yu, Clare Primiero, Cristina Vico-Alonso, Zhonghua Wang, Litao Yang, Philipp Tschandl, Ming Hu, Lie Ju, Gin Tan, Vincent Tang, Aik Beng Ng, David Powell, Paul Bonnington, Simon See, Elisabetta Magnaterra, Peter Ferguson, Jennifer Nguyen, Pascale Guitera, Jose Banuls, Monika Janda, Victoria Mar, Harald Kittler, H. Peter Soyer, Zongyuan Ge,
- Abstract summary: PanDerm is a multimodal dermatology foundation model pretrained through self-supervised learning on over 2 million real-world skin disease images.<n>PanDerm achieved state-of-the-art performance across all evaluated tasks, often outperforming existing models when using only 10% of labeled data.
- Score: 14.481765406970657
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diagnosing and treating skin diseases require advanced visual skills across domains and the ability to synthesize information from multiple imaging modalities. While current deep learning models excel at specific tasks like skin cancer diagnosis from dermoscopic images, they struggle to meet the complex, multimodal requirements of clinical practice. Here, we introduce PanDerm, a multimodal dermatology foundation model pretrained through self-supervised learning on over 2 million real-world skin disease images from 11 clinical institutions across 4 imaging modalities. We evaluated PanDerm on 28 diverse benchmarks, including skin cancer screening, risk stratification, differential diagnosis of common and rare skin conditions, lesion segmentation, longitudinal monitoring, and metastasis prediction and prognosis. PanDerm achieved state-of-the-art performance across all evaluated tasks, often outperforming existing models when using only 10% of labeled data. We conducted three reader studies to assess PanDerm's potential clinical utility. PanDerm outperformed clinicians by 10.2% in early-stage melanoma detection through longitudinal analysis, improved clinicians' skin cancer diagnostic accuracy by 11% on dermoscopy images, and enhanced non-dermatologist healthcare providers' differential diagnosis by 16.5% across 128 skin conditions on clinical photographs. These results demonstrate PanDerm's potential to improve patient care across diverse clinical scenarios and serve as a model for developing multimodal foundation models in other medical specialties, potentially accelerating the integration of AI support in healthcare. The code can be found at https://github.com/SiyuanYan1/PanDerm.
Related papers
- DermINO: Hybrid Pretraining for a Versatile Dermatology Foundation Model [92.66916452260553]
DermNIO is a versatile foundation model for dermatology.<n>It incorporates a novel hybrid pretraining framework that augments the self-supervised learning paradigm.<n>It consistently outperforms state-of-the-art models across a wide range of tasks.
arXiv Detail & Related papers (2025-08-17T00:41:39Z) - EndoBench: A Comprehensive Evaluation of Multi-Modal Large Language Models for Endoscopy Analysis [62.00431604976949]
EndoBench is the first comprehensive benchmark specifically designed to assess MLLMs across the full spectrum of endoscopic practice.<n>We benchmark 23 state-of-the-art models, including general-purpose, medical-specialized, and proprietary MLLMs.<n>Our experiments reveal: proprietary MLLMs outperform open-source and medical-specialized models overall, but still trail human experts.
arXiv Detail & Related papers (2025-05-29T16:14:34Z) - Interactive Tumor Progression Modeling via Sketch-Based Image Editing [54.47725383502915]
We propose SkEditTumor, a sketch-based diffusion model for controllable tumor progression editing.
By leveraging sketches as structural priors, our method enables precise modifications of tumor regions while maintaining structural integrity and visual realism.
Our contributions include a novel integration of sketches with diffusion models for medical image editing, fine-grained control over tumor progression visualization, and extensive validation across multiple datasets, setting a new benchmark in the field.
arXiv Detail & Related papers (2025-03-10T00:04:19Z) - FairSkin: Fair Diffusion for Skin Disease Image Generation [54.29840149709033]
Diffusion Model (DM) has become a leading method in generating synthetic medical images, but it suffers from a critical twofold bias.
We propose FairSkin, a novel DM framework that mitigates these biases through a three-level resampling mechanism.
Our approach significantly improves the diversity and quality of generated images, contributing to more equitable skin disease detection in clinical settings.
arXiv Detail & Related papers (2024-10-29T21:37:03Z) - Equitable Skin Disease Prediction Using Transfer Learning and Domain Adaptation [1.9505972437091028]
Existing artificial intelligence (AI) models in dermatology face challenges in accurately diagnosing diseases across diverse skin tones.
We employ a transfer-learning approach that capitalizes on the rich, transferable knowledge from various image domains.
Among all methods, Med-ViT emerged as the top performer due to its comprehensive feature representation learned from diverse image sources.
arXiv Detail & Related papers (2024-09-01T23:48:26Z) - Potential of Multimodal Large Language Models for Data Mining of Medical Images and Free-text Reports [51.45762396192655]
Multimodal large language models (MLLMs) have recently transformed many domains, significantly affecting the medical field. Notably, Gemini-Vision-series (Gemini) and GPT-4-series (GPT-4) models have epitomized a paradigm shift in Artificial General Intelligence for computer vision.
This study evaluated the performance of the Gemini, GPT-4, and 4 popular large models for an exhaustive evaluation across 14 medical imaging datasets.
arXiv Detail & Related papers (2024-07-08T09:08:42Z) - Optimizing Skin Lesion Classification via Multimodal Data and Auxiliary
Task Integration [54.76511683427566]
This research introduces a novel multimodal method for classifying skin lesions, integrating smartphone-captured images with essential clinical and demographic information.
A distinctive aspect of this method is the integration of an auxiliary task focused on super-resolution image prediction.
The experimental evaluations have been conducted using the PAD-UFES20 dataset, applying various deep-learning architectures.
arXiv Detail & Related papers (2024-02-16T05:16:20Z) - Skin Cancer Segmentation and Classification Using Vision Transformer for
Automatic Analysis in Dermatoscopy-based Non-invasive Digital System [0.0]
This study introduces a groundbreaking approach to skin cancer classification, employing the Vision Transformer.
The Vision Transformer is a state-of-the-art deep learning architecture renowned for its success in diverse image analysis tasks.
The Segment Anything Model aids in precise segmentation of cancerous areas, attaining high IOU and Dice Coefficient.
arXiv Detail & Related papers (2024-01-09T11:22:54Z) - Revamping AI Models in Dermatology: Overcoming Critical Challenges for
Enhanced Skin Lesion Diagnosis [8.430482797862926]
We present an All-In-One textbfHierarchical-textbfOut of Distribution-textbfClinical Triage model.
For a clinical image, our model generates three outputs: a hierarchical prediction, an alert for out-of-distribution images, and a recommendation for dermoscopy.
Our versatile model provides valuable decision support for lesion diagnosis and sets a promising precedent for medical AI applications.
arXiv Detail & Related papers (2023-11-02T06:08:49Z) - Can GPT-4V(ision) Serve Medical Applications? Case Studies on GPT-4V for
Multimodal Medical Diagnosis [59.35504779947686]
GPT-4V is OpenAI's newest model for multimodal medical diagnosis.
Our evaluation encompasses 17 human body systems.
GPT-4V demonstrates proficiency in distinguishing between medical image modalities and anatomy.
It faces significant challenges in disease diagnosis and generating comprehensive reports.
arXiv Detail & Related papers (2023-10-15T18:32:27Z) - A Novel Multi-Task Model Imitating Dermatologists for Accurate
Differential Diagnosis of Skin Diseases in Clinical Images [27.546559936765863]
A novel multi-task model, namely DermImitFormer, is proposed to fill this gap by imitating dermatologists' diagnostic procedures and strategies.
The model simultaneously predicts body parts and lesion attributes in addition to the disease itself, enhancing diagnosis accuracy and improving diagnosis interpretability.
arXiv Detail & Related papers (2023-07-17T08:05:30Z) - A Transformer-based representation-learning model with unified
processing of multimodal input for clinical diagnostics [63.106382317917344]
We report a Transformer-based representation-learning model as a clinical diagnostic aid that processes multimodal input in a unified manner.
The unified model outperformed an image-only model and non-unified multimodal diagnosis models in the identification of pulmonary diseases.
arXiv Detail & Related papers (2023-06-01T16:23:47Z) - Early Melanoma Diagnosis with Sequential Dermoscopic Images [10.487636624052564]
Existing algorithms for early melanoma diagnosis are developed using single time-point images of lesions.
We propose a framework for automated early melanoma diagnosis using sequential dermoscopic images.
arXiv Detail & Related papers (2021-10-12T13:05:41Z) - Malignancy Prediction and Lesion Identification from Clinical
Dermatological Images [65.1629311281062]
We consider machine-learning-based malignancy prediction and lesion identification from clinical dermatological images.
We first identify all lesions present in the image regardless of sub-type or likelihood of malignancy, then it estimates their likelihood of malignancy, and through aggregation, it also generates an image-level likelihood of malignancy.
arXiv Detail & Related papers (2021-04-02T20:52:05Z) - A Patient-Centric Dataset of Images and Metadata for Identifying
Melanomas Using Clinical Context [39.10946113351587]
The 2020 SIIM-ISIC Melanoma Classification challenge dataset was constructed to address the discrepancy between prior challenges and clinical practice.
The dataset represents 2,056 patients from three continents with an average of 16 lesions per patient.
arXiv Detail & Related papers (2020-08-07T20:22:23Z) - Co-Heterogeneous and Adaptive Segmentation from Multi-Source and
Multi-Phase CT Imaging Data: A Study on Pathological Liver and Lesion
Segmentation [48.504790189796836]
We present a novel segmentation strategy, co-heterogenous and adaptive segmentation (CHASe)
We propose a versatile framework that fuses appearance based semi-supervision, mask based adversarial domain adaptation, and pseudo-labeling.
CHASe can further improve pathological liver mask Dice-Sorensen coefficients by ranges of $4.2% sim 9.4%$.
arXiv Detail & Related papers (2020-05-27T06:58:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.