Different Speech Translation Models Encode and Translate Speaker Gender Differently
- URL: http://arxiv.org/abs/2506.02172v1
- Date: Mon, 02 Jun 2025 18:58:41 GMT
- Title: Different Speech Translation Models Encode and Translate Speaker Gender Differently
- Authors: Dennis Fucci, Marco Gaido, Matteo Negri, Luisa Bentivogli, Andre Martins, Giuseppe Attanasio,
- Abstract summary: We use probing methods to assess gender encoding across diverse speech translation models.<n>Results indicate that while traditional encoder-decoder models capture gender information, newer architectures integrate a speech encoder with a machine translation system via adapters.
- Score: 21.925623428254543
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Recent studies on interpreting the hidden states of speech models have shown their ability to capture speaker-specific features, including gender. Does this finding also hold for speech translation (ST) models? If so, what are the implications for the speaker's gender assignment in translation? We address these questions from an interpretability perspective, using probing methods to assess gender encoding across diverse ST models. Results on three language directions (English-French/Italian/Spanish) indicate that while traditional encoder-decoder models capture gender information, newer architectures -- integrating a speech encoder with a machine translation system via adapters -- do not. We also demonstrate that low gender encoding capabilities result in systems' tendency toward a masculine default, a translation bias that is more pronounced in newer architectures.
Related papers
- Addressing speaker gender bias in large scale speech translation systems [20.698663542717544]
This study addresses the issue of speaker gender bias in Speech Translation (ST) systems.<n>We employ Large Language Models (LLMs) to rectify translations based on the speaker's gender in a cost-effective manner.<n>We demonstrate a 70% improvement in translations for female speakers compared to our baseline and other large-scale ST systems.
arXiv Detail & Related papers (2025-01-10T14:20:46Z) - TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation [97.54885207518946]
We introduce a novel model framework TransVIP that leverages diverse datasets in a cascade fashion.
We propose two separated encoders to preserve the speaker's voice characteristics and isochrony from the source speech during the translation process.
Our experiments on the French-English language pair demonstrate that our model outperforms the current state-of-the-art speech-to-speech translation model.
arXiv Detail & Related papers (2024-05-28T04:11:37Z) - Twists, Humps, and Pebbles: Multilingual Speech Recognition Models Exhibit Gender Performance Gaps [25.95711246919163]
Current automatic speech recognition (ASR) models are designed to be used across many languages and tasks without substantial changes.
Our study systematically evaluates the performance of two widely used multilingual ASR models on three datasets.
Our findings reveal clear gender disparities, with the advantaged group varying across languages and models.
arXiv Detail & Related papers (2024-02-28T00:24:29Z) - Multilingual Text-to-Image Generation Magnifies Gender Stereotypes and Prompt Engineering May Not Help You [64.74707085021858]
We show that multilingual models suffer from significant gender biases just as monolingual models do.<n>We propose a novel benchmark, MAGBIG, intended to foster research on gender bias in multilingual models.<n>Our results show that not only do models exhibit strong gender biases but they also behave differently across languages.
arXiv Detail & Related papers (2024-01-29T12:02:28Z) - How To Build Competitive Multi-gender Speech Translation Models For
Controlling Speaker Gender Translation [21.125217707038356]
When translating from notional gender languages into grammatical gender languages, the generated translation requires explicit gender assignments for various words, including those referring to the speaker.
To avoid such biased and not inclusive behaviors, the gender assignment of speaker-related expressions should be guided by externally-provided metadata about the speaker's gender.
This paper aims to achieve the same results by integrating the speaker's gender metadata into a single "multi-gender" neural ST model, easier to maintain.
arXiv Detail & Related papers (2023-10-23T17:21:32Z) - Speech-to-Speech Translation with Discrete-Unit-Based Style Transfer [53.72998363956454]
Direct speech-to-speech translation (S2ST) with discrete self-supervised representations has achieved remarkable accuracy.
The scarcity of high-quality speaker-parallel data poses a challenge for learning style transfer during translation.
We design an S2ST pipeline with style-transfer capability on the basis of discrete self-supervised speech representations and timbre units.
arXiv Detail & Related papers (2023-09-14T09:52:08Z) - VisoGender: A dataset for benchmarking gender bias in image-text pronoun
resolution [80.57383975987676]
VisoGender is a novel dataset for benchmarking gender bias in vision-language models.
We focus on occupation-related biases within a hegemonic system of binary gender, inspired by Winograd and Winogender schemas.
We benchmark several state-of-the-art vision-language models and find that they demonstrate bias in resolving binary gender in complex scenes.
arXiv Detail & Related papers (2023-06-21T17:59:51Z) - A unified one-shot prosody and speaker conversion system with
self-supervised discrete speech units [94.64927912924087]
Existing systems ignore the correlation between prosody and language content, leading to degradation of naturalness in converted speech.
We devise a cascaded modular system leveraging self-supervised discrete speech units as language representation.
Experiments show that our system outperforms previous approaches in naturalness, intelligibility, speaker transferability, and prosody transferability.
arXiv Detail & Related papers (2022-11-12T00:54:09Z) - Breeding Gender-aware Direct Speech Translation Systems [14.955696163410254]
We show that gender-aware direct ST solutions can significantly outperform strong - but gender-unaware - direct ST models.
The translation of gender-marked words can increase up to 30 points in accuracy while preserving overall translation quality.
arXiv Detail & Related papers (2020-12-09T10:18:03Z) - Bridging the Modality Gap for Speech-to-Text Translation [57.47099674461832]
End-to-end speech translation aims to translate speech in one language into text in another language via an end-to-end way.
Most existing methods employ an encoder-decoder structure with a single encoder to learn acoustic representation and semantic information simultaneously.
We propose a Speech-to-Text Adaptation for Speech Translation model which aims to improve the end-to-end model performance by bridging the modality gap between speech and text.
arXiv Detail & Related papers (2020-10-28T12:33:04Z) - Gender in Danger? Evaluating Speech Translation Technology on the
MuST-SHE Corpus [20.766890957411132]
Translating from languages without productive grammatical gender like English into gender-marked languages is a well-known difficulty for machines.
Can audio provide additional information to reduce gender bias?
We present the first thorough investigation of gender bias in speech translation, contributing with the release of a benchmark useful for future studies.
arXiv Detail & Related papers (2020-06-10T09:55:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.