Voice Passing : a Non-Binary Voice Gender Prediction System for evaluating Transgender voice transition
- URL: http://arxiv.org/abs/2404.15176v1
- Date: Tue, 23 Apr 2024 16:15:39 GMT
- Title: Voice Passing : a Non-Binary Voice Gender Prediction System for evaluating Transgender voice transition
- Authors: David Doukhan, Simon Devauchelle, Lucile Girard-Monneron, Mía Chávez Ruz, V. Chaddouk, Isabelle Wagner, Albert Rilliard,
- Abstract summary: This paper presents a software allowing to describe voices using a continuous Voice Femininity Percentage (VFP)
It is intended for transgender speakers during their voice transition and for voice therapists supporting them in this process.
- Score: 0.7915536524413253
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This paper presents a software allowing to describe voices using a continuous Voice Femininity Percentage (VFP). This system is intended for transgender speakers during their voice transition and for voice therapists supporting them in this process. A corpus of 41 French cis- and transgender speakers was recorded. A perceptual evaluation allowed 57 participants to estimate the VFP for each voice. Binary gender classification models were trained on external gender-balanced data and used on overlapping windows to obtain average gender prediction estimates, which were calibrated to predict VFP and obtained higher accuracy than $F_0$ or vocal track length-based models. Training data speaking style and DNN architecture were shown to impact VFP estimation. Accuracy of the models was affected by speakers' age. This highlights the importance of style, age, and the conception of gender as binary or not, to build adequate statistical representations of cultural concepts.
Related papers
- Beyond Binary Gender: Evaluating Gender-Inclusive Machine Translation with Ambiguous Attitude Words [85.48043537327258]
Existing machine translation gender bias evaluations are primarily focused on male and female genders.
This study presents a benchmark AmbGIMT (Gender-Inclusive Machine Translation with Ambiguous attitude words)
We propose a novel process to evaluate gender bias based on the Emotional Attitude Score (EAS), which is used to quantify ambiguous attitude words.
arXiv Detail & Related papers (2024-07-23T08:13:51Z) - Speech After Gender: A Trans-Feminine Perspective on Next Steps for Speech Science and Technology [1.7126708168238125]
trans-feminine gender-affirming voice teachers have unique perspectives on voice that confound current understandings of speaker identity.
We present the Versatile Voice dataset (VVD), a collection of three speakers modifying their voices along gendered axes.
arXiv Detail & Related papers (2024-07-09T21:19:49Z) - Evolution of Voices in French Audiovisual Media Across Genders and Age in a Diachronic Perspective [0.9449650062296824]
We present a diachronic acoustic analysis of the voice of 1023 speakers from French media archives.
Speakers are spread across 32 categories based on four periods (years 1955/56, 1975/76, 1995/96, 2015/16), four age groups (20-35; 36-50; 51-65, >65), and two genders.
arXiv Detail & Related papers (2024-04-24T18:00:06Z) - How To Build Competitive Multi-gender Speech Translation Models For
Controlling Speaker Gender Translation [21.125217707038356]
When translating from notional gender languages into grammatical gender languages, the generated translation requires explicit gender assignments for various words, including those referring to the speaker.
To avoid such biased and not inclusive behaviors, the gender assignment of speaker-related expressions should be guided by externally-provided metadata about the speaker's gender.
This paper aims to achieve the same results by integrating the speaker's gender metadata into a single "multi-gender" neural ST model, easier to maintain.
arXiv Detail & Related papers (2023-10-23T17:21:32Z) - No Pitch Left Behind: Addressing Gender Unbalance in Automatic Speech
Recognition through Pitch Manipulation [20.731375136671605]
We propose a data augmentation technique that manipulates the fundamental frequency (f0) and formants.
This technique reduces the data unbalance among genders by simulating voices of the under-represented female speakers.
Experiments on spontaneous English speech show that our technique yields a relative WER improvement up to 9.87% for utterances by female speakers.
arXiv Detail & Related papers (2023-10-10T12:55:22Z) - The Gender-GAP Pipeline: A Gender-Aware Polyglot Pipeline for Gender
Characterisation in 55 Languages [51.2321117760104]
This paper describes the Gender-GAP Pipeline, an automatic pipeline to characterize gender representation in large-scale datasets for 55 languages.
The pipeline uses a multilingual lexicon of gendered person-nouns to quantify the gender representation in text.
We showcase it to report gender representation in WMT training data and development data for the News task, confirming that current data is skewed towards masculine representation.
arXiv Detail & Related papers (2023-08-31T17:20:50Z) - VisoGender: A dataset for benchmarking gender bias in image-text pronoun
resolution [80.57383975987676]
VisoGender is a novel dataset for benchmarking gender bias in vision-language models.
We focus on occupation-related biases within a hegemonic system of binary gender, inspired by Winograd and Winogender schemas.
We benchmark several state-of-the-art vision-language models and find that they demonstrate bias in resolving binary gender in complex scenes.
arXiv Detail & Related papers (2023-06-21T17:59:51Z) - Generating Multilingual Gender-Ambiguous Text-to-Speech Voices [4.005334718121374]
This work addresses the task of generating novel gender-ambiguous TTS voices in a multi-speaker, multilingual setting.
To our knowledge, this is the first systematic and validated approach that can reliably generate a variety of gender-ambiguous voices.
arXiv Detail & Related papers (2022-11-01T10:40:24Z) - On Prosody Modeling for ASR+TTS based Voice Conversion [82.65378387724641]
In voice conversion, an approach showing promising results in the latest voice conversion challenge (VCC) 2020 is to first use an automatic speech recognition (ASR) model to transcribe the source speech into the underlying linguistic contents.
Such a paradigm, referred to as ASR+TTS, overlooks the modeling of prosody, which plays an important role in speech naturalness and conversion similarity.
We propose to directly predict prosody from the linguistic representation in a target-speaker-dependent manner, referred to as target text prediction (TTP)
arXiv Detail & Related papers (2021-07-20T13:30:23Z) - Any-to-One Sequence-to-Sequence Voice Conversion using Self-Supervised
Discrete Speech Representations [49.55361944105796]
We present a novel approach to any-to-one (A2O) voice conversion (VC) in a sequence-to-sequence framework.
A2O VC aims to convert any speaker, including those unseen during training, to a fixed target speaker.
arXiv Detail & Related papers (2020-10-23T08:34:52Z) - Multi-Dimensional Gender Bias Classification [67.65551687580552]
Machine learning models can inadvertently learn socially undesirable patterns when training on gender biased text.
We propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions.
Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information.
arXiv Detail & Related papers (2020-05-01T21:23:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.