Related papers: SkinCAP: A Multi-modal Dermatology Dataset Annotated with Rich Medical Captions

SkinCAP: A Multi-modal Dermatology Dataset Annotated with Rich Medical Captions

URL: http://arxiv.org/abs/2405.18004v1
Date: Tue, 28 May 2024 09:48:23 GMT
Title: SkinCAP: A Multi-modal Dermatology Dataset Annotated with Rich Medical Captions
Authors: Juexiao Zhou, Liyuan Sun, Yan Xu, Wenbin Liu, Shawn Afvari, Zhongyi Han, Jiaoyan Song, Yongzhi Ji, Xiaonan He, Xin Gao,
Abstract summary: SkinCAP comprises 4,000 images sourced from the Fitzpatrick 17k skin disease dataset and the Diverse Dermatology Images dataset. Notably, SkinCAP represents the world's first such dataset and is publicly available at https://huggingface.co/datasets/joshuachou/SkinCAP.
Score: 17.803181915074706
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: With the widespread application of artificial intelligence (AI), particularly deep learning (DL) and vision-based large language models (VLLMs), in skin disease diagnosis, the need for interpretability becomes crucial. However, existing dermatology datasets are limited in their inclusion of concept-level meta-labels, and none offer rich medical descriptions in natural language. This deficiency impedes the advancement of LLM-based methods in dermatological diagnosis. To address this gap and provide a meticulously annotated dermatology dataset with comprehensive natural language descriptions, we introduce SkinCAP: a multi-modal dermatology dataset annotated with rich medical captions. SkinCAP comprises 4,000 images sourced from the Fitzpatrick 17k skin disease dataset and the Diverse Dermatology Images dataset, annotated by board-certified dermatologists to provide extensive medical descriptions and captions. Notably, SkinCAP represents the world's first such dataset and is publicly available at https://huggingface.co/datasets/joshuachou/SkinCAP.

Related papers

Derm1M: A Million-scale Vision-Language Dataset Aligned with Clinical Ontology Knowledge for Dermatology [20.650401805716744]
We present Derm1M, the first large-scale vision-language dataset for dermatology, comprising 1,029,761 image-text pairs. To demonstrate Derm1M potential in advancing both AI research and clinical application, we pretrained a series of CLIP-like models, collectively called DermLIP, on this dataset.
arXiv Detail & Related papers (2025-03-19T05:30:01Z)
DermaSynth: Rich Synthetic Image-Text Pairs Using Open Access Dermatology Datasets [0.9094611563359232]
Derma Synth is a dataset of 92,020 synthetic image--text pairs curated from 45,205 images. We leverage state-of-the-art vision large language models, using Gemini 2.0, to generate diverse and rich synthetic texts.
arXiv Detail & Related papers (2025-01-31T22:26:33Z)
BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature [73.39593644054865]
BIOMEDICA is a scalable, open-source framework to extract, annotate, and serialize the entirety of the PubMed Central Open Access subset into an easy-to-use, publicly accessible dataset. Our framework produces a comprehensive archive with over 24 million unique image-text pairs from over 6 million articles. BMCA-CLIP is a suite of CLIP-style models continuously pretrained on the BIOMEDICA dataset via streaming, eliminating the need to download 27 TB of data locally.
arXiv Detail & Related papers (2025-01-13T09:58:03Z)
UniMed-CLIP: Towards a Unified Image-Text Pretraining Paradigm for Diverse Medical Imaging Modalities [68.12889379702824]
Vision-Language Models (VLMs) trained via contrastive learning have achieved notable success in natural image tasks. UniMed is a large-scale, open-source multi-modal medical dataset comprising over 5.3 million image-text pairs. We trained UniMed-CLIP, a unified VLM for six modalities, achieving notable gains in zero-shot evaluations.
arXiv Detail & Related papers (2024-12-13T18:59:40Z)
A Survey of Medical Vision-and-Language Applications and Their Techniques [48.268198631277315]
Medical vision-and-language models (MVLMs) have attracted substantial interest due to their capability to offer a natural language interface for interpreting complex medical data. Here, we provide a comprehensive overview of MVLMs and the various medical tasks to which they have been applied. We also examine the datasets used for these tasks and compare the performance of different models based on standardized evaluation metrics.
arXiv Detail & Related papers (2024-11-19T03:27:05Z)
PASSION for Dermatology: Bridging the Diversity Gap with Pigmented Skin Images from Sub-Saharan Africa [29.405369900938393]
Africa faces a huge shortage of dermatologists, with less than one per million people. This is in stark contrast to the high demand for dermatologic care, with 80% of the paediatric population suffering from largely untreated skin conditions. The PASSION project aims to address this issue by collecting images of skin diseases in Sub-Saharan countries with the aim of open-sourcing this data.
arXiv Detail & Related papers (2024-11-07T10:11:37Z)
Equitable Skin Disease Prediction Using Transfer Learning and Domain Adaptation [1.9505972437091028]
Existing artificial intelligence (AI) models in dermatology face challenges in accurately diagnosing diseases across diverse skin tones. We employ a transfer-learning approach that capitalizes on the rich, transferable knowledge from various image domains. Among all methods, Med-ViT emerged as the top performer due to its comprehensive feature representation learned from diverse image sources.
arXiv Detail & Related papers (2024-09-01T23:48:26Z)
MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine [53.01393667775077]
This paper introduces MedTrinity-25M, a comprehensive, large-scale multimodal dataset for medicine. It covers over 25 million images across 10 modalities with multigranular annotations for more than 65 diseases. Unlike the existing multimodal datasets, which are limited by the availability of image-text pairs, we have developed the first automated pipeline.
arXiv Detail & Related papers (2024-08-06T02:09:35Z)
DERM12345: A Large, Multisource Dermatoscopic Skin Lesion Dataset with 38 Subclasses [0.48212500317840945]
This study presents a diverse dataset comprising 12,345 dermatoscopic images with 38 subclasses of skin lesions collected in Turkiye. This dataset distinguishes itself through a diverse structure with 5 super classes, 15 main classes, 38 subclasses and its 12,345 high-resolution dermatoscopic images.
arXiv Detail & Related papers (2024-06-11T16:27:32Z)
SkinGEN: an Explainable Dermatology Diagnosis-to-Generation Framework with Interactive Vision-Language Models [52.90397538472582]
SkinGEN is a diagnosis-to-generation framework that generates reference demonstrations from diagnosis results provided by VLM. We conduct a user study with 32 participants evaluating both the system performance and explainability. Results demonstrate that SkinGEN significantly improves users' comprehension of VLM predictions and fosters increased trust in the diagnostic process.
arXiv Detail & Related papers (2024-04-23T05:36:33Z)
Optimizing Skin Lesion Classification via Multimodal Data and Auxiliary Task Integration [54.76511683427566]
This research introduces a novel multimodal method for classifying skin lesions, integrating smartphone-captured images with essential clinical and demographic information. A distinctive aspect of this method is the integration of an auxiliary task focused on super-resolution image prediction. The experimental evaluations have been conducted using the PAD-UFES20 dataset, applying various deep-learning architectures.
arXiv Detail & Related papers (2024-02-16T05:16:20Z)
ERCPMP: An Endoscopic Image and Video Dataset for Colorectal Polyps Morphology and Pathology [0.0]
This dataset contains demographic, morphological and pathological data, endoscopic images and videos of 191 patients with colorectal polyps. Pathological data includes the diagnosis of the polyps including Tubular, Villous, Tubulovillous, Hyperplastic, Serrated, Inflammatory and Adenocarcinoma with Dysplasia Grade & Differentiation.
arXiv Detail & Related papers (2023-07-28T09:52:20Z)
PMC-LLaMA: Towards Building Open-source Language Models for Medicine [62.39105735933138]
Large Language Models (LLMs) have showcased remarkable capabilities in natural language understanding. LLMs struggle in domains that require precision, such as medical applications, due to their lack of domain-specific knowledge. We describe the procedure for building a powerful, open-source language model specifically designed for medicine applications, termed as PMC-LLaMA.
arXiv Detail & Related papers (2023-04-27T18:29:05Z)
Cross-Modal Causal Intervention for Medical Report Generation [109.83549148448469]
Medical report generation (MRG) is essential for computer-aided diagnosis and medication guidance. Due to the spurious correlations within image-text data induced by visual and linguistic biases, it is challenging to generate accurate reports reliably describing lesion areas. We propose a novel Visual-Linguistic Causal Intervention (VLCI) framework for MRG, which consists of a visual deconfounding module (VDM) and a linguistic deconfounding module (LDM)
arXiv Detail & Related papers (2023-03-16T07:23:55Z)
SkinCon: A skin disease dataset densely annotated by domain experts for fine-grained model debugging and analysis [9.251248318564617]
concepts are meta-labels that are semantically meaningful to humans. Densely annotated datasets in medicine focused on meta-labels relevant to a single disease such as melanoma. SkinCon includes 3230 images from the Fitzpatrick 17k dataset densely annotated with 48 clinical concepts.
arXiv Detail & Related papers (2023-02-01T22:39:51Z)
G-MIND: An End-to-End Multimodal Imaging-Genetics Framework for Biomarker Identification and Disease Classification [49.53651166356737]
We propose a novel deep neural network architecture to integrate imaging and genetics data, as guided by diagnosis, that provides interpretable biomarkers. We have evaluated our model on a population study of schizophrenia that includes two functional MRI (fMRI) paradigms and Single Nucleotide Polymorphism (SNP) data.
arXiv Detail & Related papers (2021-01-27T19:28:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.