VoxCeleb Enrichment for Age and Gender Recognition
- URL: http://arxiv.org/abs/2109.13510v1
- Date: Tue, 28 Sep 2021 06:18:57 GMT
- Title: VoxCeleb Enrichment for Age and Gender Recognition
- Authors: Khaled Hechmi, Trung Ngo Trong, Ville Hautamaki, Tomi Kinnunen
- Abstract summary: We provide speaker age labels and (an alternative) annotation of speaker gender in VoxCeleb datasets.
We demonstrate the use of this metadata by constructing age and gender recognition models.
We also compare the original VoxCeleb gender labels with our labels to identify records that might be mislabeled in the original VoxCeleb data.
- Score: 12.520037579004883
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: VoxCeleb datasets are widely used in speaker recognition studies. Our work
serves two purposes. First, we provide speaker age labels and (an alternative)
annotation of speaker gender. Second, we demonstrate the use of this metadata
by constructing age and gender recognition models with different features and
classifiers. We query different celebrity databases and apply consensus rules
to derive age and gender labels. We also compare the original VoxCeleb gender
labels with our labels to identify records that might be mislabeled in the
original VoxCeleb data. On modeling side, we design a comprehensive study of
multiple features and models for recognizing gender and age. Our best system,
using i-vector features, achieved an F1-score of 0.9829 for gender recognition
task using logistic regression, and the lowest mean absolute error (MAE) in age
regression, 9.443 years, is obtained with ridge regression. This indicates
challenge in age estimation from in-the-wild style speech data.
Related papers
- GenderBias-\emph{VL}: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing [72.0343083866144]
This paper introduces the GenderBias-emphVL benchmark to evaluate occupation-related gender bias in Large Vision-Language Models.
Using our benchmark, we extensively evaluate 15 commonly used open-source LVLMs and state-of-the-art commercial APIs.
Our findings reveal widespread gender biases in existing LVLMs.
arXiv Detail & Related papers (2024-06-30T05:55:15Z) - Sonos Voice Control Bias Assessment Dataset: A Methodology for Demographic Bias Assessment in Voice Assistants [10.227469020901232]
This paper introduces the Sonos Voice Control Bias Assessment dataset.
1,038 speakers, 166 hours, 170k audio samples, with 9,040 unique labelled transcripts.
Results show statistically significant differences in performance across age, dialectal region and ethnicity.
arXiv Detail & Related papers (2024-05-14T12:53:32Z) - How To Build Competitive Multi-gender Speech Translation Models For
Controlling Speaker Gender Translation [21.125217707038356]
When translating from notional gender languages into grammatical gender languages, the generated translation requires explicit gender assignments for various words, including those referring to the speaker.
To avoid such biased and not inclusive behaviors, the gender assignment of speaker-related expressions should be guided by externally-provided metadata about the speaker's gender.
This paper aims to achieve the same results by integrating the speaker's gender metadata into a single "multi-gender" neural ST model, easier to maintain.
arXiv Detail & Related papers (2023-10-23T17:21:32Z) - VisoGender: A dataset for benchmarking gender bias in image-text pronoun
resolution [80.57383975987676]
VisoGender is a novel dataset for benchmarking gender bias in vision-language models.
We focus on occupation-related biases within a hegemonic system of binary gender, inspired by Winograd and Winogender schemas.
We benchmark several state-of-the-art vision-language models and find that they demonstrate bias in resolving binary gender in complex scenes.
arXiv Detail & Related papers (2023-06-21T17:59:51Z) - Auditing Gender Presentation Differences in Text-to-Image Models [54.16959473093973]
We study how gender is presented differently in text-to-image models.
By probing gender indicators in the input text, we quantify the frequency differences of presentation-centric attributes.
We propose an automatic method to estimate such differences.
arXiv Detail & Related papers (2023-02-07T18:52:22Z) - Estimation of speaker age and height from speech signal using bi-encoder
transformer mixture model [3.1447111126464997]
We propose a bi-encoder transformer mixture model for speaker age and height estimation.
Considering the wide differences in male and female voice characteristics, we propose the use of two separate transformer encoders.
We significantly outperform the current state-of-the-art results on age estimation.
arXiv Detail & Related papers (2022-03-22T14:39:56Z) - Are Commercial Face Detection Models as Biased as Academic Models? [64.71318433419636]
We compare academic and commercial face detection systems, specifically examining robustness to noise.
We find that state-of-the-art academic face detection models exhibit demographic disparities in their noise robustness.
We conclude that commercial models are always as biased or more biased than an academic model.
arXiv Detail & Related papers (2022-01-25T02:21:42Z) - What's in a Name? -- Gender Classification of Names with Character Based
Machine Learning Models [6.805167389805055]
We consider the problem of predicting the gender of registered users based on their declared name.
By analyzing the first names of 100M+ users, we found that genders can be very effectively classified using the composition of the name strings.
arXiv Detail & Related papers (2021-02-07T01:01:32Z) - Mitigating Gender Bias in Captioning Systems [56.25457065032423]
Most captioning models learn gender bias, leading to high gender prediction errors, especially for women.
We propose a new Guided Attention Image Captioning model (GAIC) which provides self-guidance on visual attention to encourage the model to capture correct gender visual evidence.
arXiv Detail & Related papers (2020-06-15T12:16:19Z) - Multi-Dimensional Gender Bias Classification [67.65551687580552]
Machine learning models can inadvertently learn socially undesirable patterns when training on gender biased text.
We propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions.
Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information.
arXiv Detail & Related papers (2020-05-01T21:23:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.