eSkinHealth: A Multimodal Dataset for Neglected Tropical Skin Diseases
- URL: http://arxiv.org/abs/2508.18608v1
- Date: Tue, 26 Aug 2025 02:24:49 GMT
- Title: eSkinHealth: A Multimodal Dataset for Neglected Tropical Skin Diseases
- Authors: Janet Wang, Xin Hu, Yunbei Zhang, Diabate Almamy, Vagamon Bamba, Konan Amos Sébastien Koffi, Yao Koffi Aubin, Zhengming Ding, Jihun Hamm, Rie R. Yotsu,
- Abstract summary: eSkinHealth is a novel dataset collected on-site in Cote d'Ivoire and Ghana.<n>It contains 5,623 images from 1,639 cases and encompasses 47 skin diseases.<n>eSkinHealth also includes semantic lesion masks, instance-specific visual captions, and clinical concepts.
- Score: 29.76522627359553
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Skin Neglected Tropical Diseases (NTDs) impose severe health and socioeconomic burdens in impoverished tropical communities. Yet, advancements in AI-driven diagnostic support are hindered by data scarcity, particularly for underrepresented populations and rare manifestations of NTDs. Existing dermatological datasets often lack the demographic and disease spectrum crucial for developing reliable recognition models of NTDs. To address this, we introduce eSkinHealth, a novel dermatological dataset collected on-site in C\^ote d'Ivoire and Ghana. Specifically, eSkinHealth contains 5,623 images from 1,639 cases and encompasses 47 skin diseases, focusing uniquely on skin NTDs and rare conditions among West African populations. We further propose an AI-expert collaboration paradigm to implement foundation language and segmentation models for efficient generation of multimodal annotations, under dermatologists' guidance. In addition to patient metadata and diagnosis labels, eSkinHealth also includes semantic lesion masks, instance-specific visual captions, and clinical concepts. Overall, our work provides a valuable new resource and a scalable annotation framework, aiming to catalyze the development of more equitable, accurate, and interpretable AI tools for global dermatology.
Related papers
- DermoGPT: Open Weights and Open Data for Morphology-Grounded Dermatological Reasoning MLLMs [54.8829900010621]
Multimodal Large Language Models (MLLMs) show promise for medical applications, yet progress in dermatology lags due to limited training data, narrow task coverage, and lack of clinically-grounded supervision.<n>We present a comprehensive framework to address these gaps.<n>First, we introduce DermoInstruct, a large-scale morphology-anchored instruction corpus comprising 211,243 images and 772,675 trajectories across five task formats.<n>Second, we establish DermoBench, a rigorous benchmark evaluating 11 tasks across four clinical axes: Morphology, Diagnosis, Reasoning, and Fairness, including a challenging subset of 3,600
arXiv Detail & Related papers (2026-01-05T07:55:36Z) - A Global Atlas of Digital Dermatology to Map Innovation and Disparities [38.74632415760177]
We present SkinMap, a multi-modal framework for the first comprehensive audit of the field's entire data basis.<n>We unify the publicly available dermatology datasets into a single, queryable semantic atlas comprising more than 1.1 million images of skin conditions.
arXiv Detail & Related papers (2025-12-27T09:22:36Z) - DermINO: Hybrid Pretraining for a Versatile Dermatology Foundation Model [69.20140430678092]
DermNIO is a versatile foundation model for dermatology.<n>It incorporates a novel hybrid pretraining framework that augments the self-supervised learning paradigm.<n>It consistently outperforms state-of-the-art models across a wide range of tasks.
arXiv Detail & Related papers (2025-08-17T00:41:39Z) - DermaCon-IN: A Multi-concept Annotated Dermatological Image Dataset of Indian Skin Disorders for Clinical AI Research [3.3114401663331137]
DermaCon-IN is a prospectively curated dataset of over 5,450 clinical images from approximately 3,000 patients in South India.<n>Each image is annotated by board-certified dermatologists with over 240 distinct diagnoses, structured under a hierarchical, etiology-based taxonomy.<n>The dataset captures a wide spectrum of dermatologic conditions and tonal variation commonly seen in Indian outpatient care.
arXiv Detail & Related papers (2025-06-06T13:59:08Z) - Adaptable Cardiovascular Disease Risk Prediction from Heterogeneous Data using Large Language Models [70.64969663547703]
AdaCVD is an adaptable CVD risk prediction framework built on large language models extensively fine-tuned on over half a million participants from the UK Biobank.<n>It addresses key clinical challenges across three dimensions: it flexibly incorporates comprehensive yet variable patient information; it seamlessly integrates both structured data and unstructured text; and it rapidly adapts to new patient populations using minimal additional data.
arXiv Detail & Related papers (2025-05-30T14:42:02Z) - PASSION for Dermatology: Bridging the Diversity Gap with Pigmented Skin Images from Sub-Saharan Africa [29.405369900938393]
Africa faces a huge shortage of dermatologists, with less than one per million people.
This is in stark contrast to the high demand for dermatologic care, with 80% of the paediatric population suffering from largely untreated skin conditions.
The PASSION project aims to address this issue by collecting images of skin diseases in Sub-Saharan countries with the aim of open-sourcing this data.
arXiv Detail & Related papers (2024-11-07T10:11:37Z) - Equitable Skin Disease Prediction Using Transfer Learning and Domain Adaptation [1.9505972437091028]
Existing artificial intelligence (AI) models in dermatology face challenges in accurately diagnosing diseases across diverse skin tones.
We employ a transfer-learning approach that capitalizes on the rich, transferable knowledge from various image domains.
Among all methods, Med-ViT emerged as the top performer due to its comprehensive feature representation learned from diverse image sources.
arXiv Detail & Related papers (2024-09-01T23:48:26Z) - Potential of Multimodal Large Language Models for Data Mining of Medical Images and Free-text Reports [51.45762396192655]
Multimodal large language models (MLLMs) have recently transformed many domains, significantly affecting the medical field. Notably, Gemini-Vision-series (Gemini) and GPT-4-series (GPT-4) models have epitomized a paradigm shift in Artificial General Intelligence for computer vision.
This study evaluated the performance of the Gemini, GPT-4, and 4 popular large models for an exhaustive evaluation across 14 medical imaging datasets.
arXiv Detail & Related papers (2024-07-08T09:08:42Z) - SkinCAP: A Multi-modal Dermatology Dataset Annotated with Rich Medical Captions [17.803181915074706]
SkinCAP comprises 4,000 images sourced from the Fitzpatrick 17k skin disease dataset and the Diverse Dermatology Images dataset.
Notably, SkinCAP represents the world's first such dataset and is publicly available at https://huggingface.co/datasets/joshuachou/SkinCAP.
arXiv Detail & Related papers (2024-05-28T09:48:23Z) - SkinGEN: an Explainable Dermatology Diagnosis-to-Generation Framework with Interactive Vision-Language Models [54.32264601568605]
SkinGEN is a diagnosis-to-generation framework that generates reference demonstrations from diagnosis results provided by VLM.<n>We conduct a user study with 32 participants evaluating both the system performance and explainability.<n>Results demonstrate that SkinGEN significantly improves users' comprehension of VLM predictions and fosters increased trust in the diagnostic process.
arXiv Detail & Related papers (2024-04-23T05:36:33Z) - Predicting Clinical Diagnosis from Patients Electronic Health Records
Using BERT-based Neural Networks [62.9447303059342]
We show the importance of this problem in medical community.
We present a modification of Bidirectional Representations from Transformers (BERT) model for classification sequence.
We use a large-scale Russian EHR dataset consisting of about 4 million unique patient visits.
arXiv Detail & Related papers (2020-07-15T09:22:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.