SkinCAP: A Multi-modal Dermatology Dataset Annotated with Rich Medical Captions
- URL: http://arxiv.org/abs/2405.18004v1
- Date: Tue, 28 May 2024 09:48:23 GMT
- Title: SkinCAP: A Multi-modal Dermatology Dataset Annotated with Rich Medical Captions
- Authors: Juexiao Zhou, Liyuan Sun, Yan Xu, Wenbin Liu, Shawn Afvari, Zhongyi Han, Jiaoyan Song, Yongzhi Ji, Xiaonan He, Xin Gao,
- Abstract summary: SkinCAP comprises 4,000 images sourced from the Fitzpatrick 17k skin disease dataset and the Diverse Dermatology Images dataset.
Notably, SkinCAP represents the world's first such dataset and is publicly available at https://huggingface.co/datasets/joshuachou/SkinCAP.
- Score: 17.803181915074706
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: With the widespread application of artificial intelligence (AI), particularly deep learning (DL) and vision-based large language models (VLLMs), in skin disease diagnosis, the need for interpretability becomes crucial. However, existing dermatology datasets are limited in their inclusion of concept-level meta-labels, and none offer rich medical descriptions in natural language. This deficiency impedes the advancement of LLM-based methods in dermatological diagnosis. To address this gap and provide a meticulously annotated dermatology dataset with comprehensive natural language descriptions, we introduce SkinCAP: a multi-modal dermatology dataset annotated with rich medical captions. SkinCAP comprises 4,000 images sourced from the Fitzpatrick 17k skin disease dataset and the Diverse Dermatology Images dataset, annotated by board-certified dermatologists to provide extensive medical descriptions and captions. Notably, SkinCAP represents the world's first such dataset and is publicly available at https://huggingface.co/datasets/joshuachou/SkinCAP.
Related papers
- A Survey of Medical Vision-and-Language Applications and Their Techniques [48.268198631277315]
Medical vision-and-language models (MVLMs) have attracted substantial interest due to their capability to offer a natural language interface for interpreting complex medical data.
Here, we provide a comprehensive overview of MVLMs and the various medical tasks to which they have been applied.
We also examine the datasets used for these tasks and compare the performance of different models based on standardized evaluation metrics.
arXiv Detail & Related papers (2024-11-19T03:27:05Z) - PASSION for Dermatology: Bridging the Diversity Gap with Pigmented Skin Images from Sub-Saharan Africa [29.405369900938393]
Africa faces a huge shortage of dermatologists, with less than one per million people.
This is in stark contrast to the high demand for dermatologic care, with 80% of the paediatric population suffering from largely untreated skin conditions.
The PASSION project aims to address this issue by collecting images of skin diseases in Sub-Saharan countries with the aim of open-sourcing this data.
arXiv Detail & Related papers (2024-11-07T10:11:37Z) - Equitable Skin Disease Prediction Using Transfer Learning and Domain Adaptation [1.9505972437091028]
Existing artificial intelligence (AI) models in dermatology face challenges in accurately diagnosing diseases across diverse skin tones.
We employ a transfer-learning approach that capitalizes on the rich, transferable knowledge from various image domains.
Among all methods, Med-ViT emerged as the top performer due to its comprehensive feature representation learned from diverse image sources.
arXiv Detail & Related papers (2024-09-01T23:48:26Z) - DERM12345: A Large, Multisource Dermatoscopic Skin Lesion Dataset with 38 Subclasses [0.48212500317840945]
This study presents a diverse dataset comprising 12,345 dermatoscopic images with 38 subclasses of skin lesions collected in Turkiye.
This dataset distinguishes itself through a diverse structure with 5 super classes, 15 main classes, 38 subclasses and its 12,345 high-resolution dermatoscopic images.
arXiv Detail & Related papers (2024-06-11T16:27:32Z) - SkinGEN: an Explainable Dermatology Diagnosis-to-Generation Framework with Interactive Vision-Language Models [52.90397538472582]
SkinGEN is a diagnosis-to-generation framework that generates reference demonstrations from diagnosis results provided by VLM.
We conduct a user study with 32 participants evaluating both the system performance and explainability.
Results demonstrate that SkinGEN significantly improves users' comprehension of VLM predictions and fosters increased trust in the diagnostic process.
arXiv Detail & Related papers (2024-04-23T05:36:33Z) - Optimizing Skin Lesion Classification via Multimodal Data and Auxiliary
Task Integration [54.76511683427566]
This research introduces a novel multimodal method for classifying skin lesions, integrating smartphone-captured images with essential clinical and demographic information.
A distinctive aspect of this method is the integration of an auxiliary task focused on super-resolution image prediction.
The experimental evaluations have been conducted using the PAD-UFES20 dataset, applying various deep-learning architectures.
arXiv Detail & Related papers (2024-02-16T05:16:20Z) - ERCPMP: An Endoscopic Image and Video Dataset for Colorectal Polyps
Morphology and Pathology [0.0]
This dataset contains demographic, morphological and pathological data, endoscopic images and videos of 191 patients with colorectal polyps.
Pathological data includes the diagnosis of the polyps including Tubular, Villous, Tubulovillous, Hyperplastic, Serrated, Inflammatory and Adenocarcinoma with Dysplasia Grade & Differentiation.
arXiv Detail & Related papers (2023-07-28T09:52:20Z) - PMC-LLaMA: Towards Building Open-source Language Models for Medicine [62.39105735933138]
Large Language Models (LLMs) have showcased remarkable capabilities in natural language understanding.
LLMs struggle in domains that require precision, such as medical applications, due to their lack of domain-specific knowledge.
We describe the procedure for building a powerful, open-source language model specifically designed for medicine applications, termed as PMC-LLaMA.
arXiv Detail & Related papers (2023-04-27T18:29:05Z) - Cross-Modal Causal Intervention for Medical Report Generation [109.83549148448469]
Medical report generation (MRG) is essential for computer-aided diagnosis and medication guidance.
Due to the spurious correlations within image-text data induced by visual and linguistic biases, it is challenging to generate accurate reports reliably describing lesion areas.
We propose a novel Visual-Linguistic Causal Intervention (VLCI) framework for MRG, which consists of a visual deconfounding module (VDM) and a linguistic deconfounding module (LDM)
arXiv Detail & Related papers (2023-03-16T07:23:55Z) - SkinCon: A skin disease dataset densely annotated by domain experts for
fine-grained model debugging and analysis [9.251248318564617]
concepts are meta-labels that are semantically meaningful to humans.
Densely annotated datasets in medicine focused on meta-labels relevant to a single disease such as melanoma.
SkinCon includes 3230 images from the Fitzpatrick 17k dataset densely annotated with 48 clinical concepts.
arXiv Detail & Related papers (2023-02-01T22:39:51Z) - G-MIND: An End-to-End Multimodal Imaging-Genetics Framework for
Biomarker Identification and Disease Classification [49.53651166356737]
We propose a novel deep neural network architecture to integrate imaging and genetics data, as guided by diagnosis, that provides interpretable biomarkers.
We have evaluated our model on a population study of schizophrenia that includes two functional MRI (fMRI) paradigms and Single Nucleotide Polymorphism (SNP) data.
arXiv Detail & Related papers (2021-01-27T19:28:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.