Related papers: Large-Scale Dataset and Benchmark for Skin Tone Classification in the Wild

Large-Scale Dataset and Benchmark for Skin Tone Classification in the Wild

URL: http://arxiv.org/abs/2603.02475v1
Date: Mon, 02 Mar 2026 23:52:22 GMT
Title: Large-Scale Dataset and Benchmark for Skin Tone Classification in the Wild
Authors: Vitor Pereira Matias, Márcus Vinícius Lobo Costa, João Batista Neto, Tiago Novello de Brito,
Abstract summary: We present a comprehensive framework for skin tone fairness.<n>First, we introduce the STW, a large-scale, open-access dataset comprising 42,313 images from 3,564 individuals.<n>Second, we benchmark both Classic Computer Vision (SkinToneCCV) and Deep Learning approaches.<n>Third, we propose SkinToneNet, a fine-tuned ViT that achieves state-of-the-art generalization on out-of-domain data.
Score: 0.6416429054645991
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep learning models often inherit biases from their training data. While fairness across gender and ethnicity is well-studied, fine-grained skin tone analysis remains a challenge due to the lack of granular, annotated datasets. Existing methods often rely on the medical 6-tone Fitzpatrick scale, which lacks visual representativeness, or use small, private datasets that prevent reproducibility, or often rely on classic computer vision pipelines, with a few using deep learning. They overlook issues like train-test leakage and dataset imbalance, and are limited by small or unavailable datasets. In this work, we present a comprehensive framework for skin tone fairness. First, we introduce the STW, a large-scale, open-access dataset comprising 42,313 images from 3,564 individuals, labeled using the 10-tone MST scale. Second, we benchmark both Classic Computer Vision (SkinToneCCV) and Deep Learning approaches, demonstrating that classic models provide near-random results, while deep learning reaches nearly annotator accuracy. Finally, we propose SkinToneNet, a fine-tuned ViT that achieves state-of-the-art generalization on out-of-domain data, which enables reliable fairness auditing of public datasets like CelebA and VGGFace2. This work provides state-of-the-art results in skin tone classification and fairness assessment. Code and data available soon

Related papers

Long-Tailed Recognition via Information-Preservable Two-Stage Learning [6.2471093754692815]
The imbalance (or long-tail) is the nature of many real-world data distributions.<n>We propose a novel two-stage learning approach to mitigate such a majority-biased tendency.<n>Our approach achieves the state-of-the-art performance across various long-tailed benchmark datasets.
arXiv Detail & Related papers (2025-10-09T21:49:12Z)
Fast Model Debias with Machine Unlearning [54.32026474971696]
Deep neural networks might behave in a biased manner in many real-world scenarios. Existing debiasing methods suffer from high costs in bias labeling or model re-training. We propose a fast model debiasing framework (FMD) which offers an efficient approach to identify, evaluate and remove biases.
arXiv Detail & Related papers (2023-10-19T08:10:57Z)
On Hate Scaling Laws For Data-Swamps [14.891493485229251]
We show that the presence of hateful content in datasets, when measured with a Hate Content Rate (HCR) metric, increased by nearly $12%$. As scale increased, the tendency of the model to associate images of human faces with the human being' class over 7 other offensive classes reduced by half. For the Black female category, the tendency of the model to associate their faces with the criminal' class doubled, while quintupling for Black male faces.
arXiv Detail & Related papers (2023-06-22T18:00:17Z)
Fairness meets Cross-Domain Learning: a new perspective on Models and Metrics [80.07271410743806]
We study the relationship between cross-domain learning (CD) and model fairness. We introduce a benchmark on face and medical images spanning several demographic groups as well as classification and localization tasks. Our study covers 14 CD approaches alongside three state-of-the-art fairness algorithms and shows how the former can outperform the latter.
arXiv Detail & Related papers (2023-03-25T09:34:05Z)
Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding. We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
arXiv Detail & Related papers (2023-01-31T20:09:33Z)
Rethinking Bias Mitigation: Fairer Architectures Make for Fairer Face Recognition [107.58227666024791]
Face recognition systems are widely deployed in safety-critical applications, including law enforcement. They exhibit bias across a range of socio-demographic dimensions, such as gender and race. Previous works on bias mitigation largely focused on pre-processing the training data.
arXiv Detail & Related papers (2022-10-18T15:46:05Z)
Assessing Demographic Bias Transfer from Dataset to Model: A Case Study in Facial Expression Recognition [1.5340540198612824]
Two metrics focus on the representational and stereotypical bias of the dataset, and the third one on the residual bias of the trained model. We demonstrate the usefulness of the metrics by applying them to a FER problem based on the popular Affectnet dataset.
arXiv Detail & Related papers (2022-05-20T09:40:42Z)
Meta Balanced Network for Fair Face Recognition [51.813457201437195]
We systematically and scientifically study bias from both data and algorithm aspects. We propose a novel meta-learning algorithm, called Meta Balanced Network (MBN), which learns adaptive margins in large margin loss. Extensive experiments show that MBN successfully mitigates bias and learns more balanced performance for people with different skin tones in face recognition.
arXiv Detail & Related papers (2022-05-13T10:25:44Z)
Color Invariant Skin Segmentation [17.501659517108884]
This paper addresses the problem of automatically detecting human skin in images without reliance on color information. A primary motivation of the work has been to achieve results that are consistent across the full range of skin tones. We present a new approach that performs well in the absence of such information.
arXiv Detail & Related papers (2022-04-21T05:07:21Z)
Perceptual Score: What Data Modalities Does Your Model Perceive? [73.75255606437808]
We introduce the perceptual score, a metric that assesses the degree to which a model relies on the different subsets of the input features. We find that recent, more accurate multi-modal models for visual question-answering tend to perceive the visual data less than their predecessors. Using the perceptual score also helps to analyze model biases by decomposing the score into data subset contributions.
arXiv Detail & Related papers (2021-10-27T12:19:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.