Causally Disentangled Contrastive Learning for Multilingual Speaker Embeddings
- URL: http://arxiv.org/abs/2602.01363v1
- Date: Sun, 01 Feb 2026 18:02:15 GMT
- Title: Causally Disentangled Contrastive Learning for Multilingual Speaker Embeddings
- Authors: Mariƫtte Olijslager, Seyed Sahand Mohammadi Ziabari, Ali Mohammed Mansoor Alsahag,
- Abstract summary: This paper investigates the extent to which demographic information, specifically gender, age, and accent, is present in SimCLR-trained speaker embeddings.<n>We study two debiasing strategies: adversarial training through gradient reversal and a causal bottleneck architecture.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-supervised speaker embeddings are widely used in speaker verification systems, but prior work has shown that they often encode sensitive demographic attributes, raising fairness and privacy concerns. This paper investigates the extent to which demographic information, specifically gender, age, and accent, is present in SimCLR-trained speaker embeddings and whether such leakage can be mitigated without severely degrading speaker verification performance. We study two debiasing strategies: adversarial training through gradient reversal and a causal bottleneck architecture that explicitly separates demographic and residual information. Demographic leakage is quantified using both linear and nonlinear probing classifiers, while speaker verification performance is evaluated using ROC-AUC and EER. Our results show that gender information is strongly and linearly encoded in baseline embeddings, whereas age and accent are weaker and primarily nonlinearly represented. Adversarial debiasing reduces gender leakage but has limited effect on age and accent and introduces a clear trade-off with verification accuracy. The causal bottleneck further suppresses demographic information, particularly in the residual representation, but incurs substantial performance degradation. These findings highlight fundamental limitations in mitigating demographic leakage in self-supervised speaker embeddings and clarify the trade-offs inherent in current debiasing approaches.
Related papers
- Generative Classifiers Avoid Shortcut Solutions [84.23247217037134]
Discriminative approaches to classification often learn shortcuts that hold in-distribution but fail under minor distribution shift.<n>We show that generative classifiers can avoid this issue by modeling all features, both core and spurious, instead of mainly spurious ones.<n>We find that diffusion-based and autorerimigressive generative classifiers achieve state-of-the-art performance on five standard image and text distribution shift benchmarks.
arXiv Detail & Related papers (2025-12-31T18:31:46Z) - Measuring Mechanistic Independence: Can Bias Be Removed Without Erasing Demographics? [17.978167351646288]
We investigate how independent demographic bias mechanisms are from general demographic recognition in language models.<n>We find that attribution-based ablations mitigate race and gender profession stereotypes while preserving name recognition accuracy.<n>We find that correlation-based ablations are more effective for education bias.
arXiv Detail & Related papers (2025-12-23T21:44:20Z) - DELULU: Discriminative Embedding Learning Using Latent Units for Speaker-Aware Self-Supervised Speech Foundational Model [65.93900011975238]
DELULU is a speaker-aware self-supervised foundational model for verification, diarization, and profiling applications.<n>It is trained using a dual objective that combines masked prediction and denoising, further enhancing robustness and generalization.<n>Our findings demonstrate that DELULU is a strong universal encoder for speaker-aware speech processing, enabling superior performance even without task-specific fine-tuning.
arXiv Detail & Related papers (2025-10-20T15:35:55Z) - Person-Centric Annotations of LAION-400M: Auditing Bias and Its Transfer to Models [81.45743826739054]
A major barrier has been the lack of demographic annotations in web-scale datasets such as LAION-400M.<n>We create person-centric annotations for the full dataset, including over 276 million bounding boxes, perceived gender and race/ethnicity labels, and automatically generated captions.<n>Using them, we uncover demographic imbalances and harmful associations, such as the disproportionate linking of men and individuals perceived as Black or Middle Eastern with crime-related and negative content.
arXiv Detail & Related papers (2025-10-04T07:51:59Z) - Who's Asking? Investigating Bias Through the Lens of Disability Framed Queries in LLMs [2.722784054643991]
Large Language Models (LLMs) routinely infer users demographic traits from phrasing alone.<n>Disability cues in shaping these inferences remains largely uncharted.<n>We present the first systematic audit of disability-conditioned demographic bias across eight state-of-the-art instruction-tuned LLMs.
arXiv Detail & Related papers (2025-08-18T21:03:09Z) - DAIQ: Auditing Demographic Attribute Inference from Question in LLMs [3.1677998308405786]
Large Language Models (LLMs) are known to reflect social biases when demographic attributes, such as gender or race, are explicitly present in the input.<n>But even in their absence, these models still infer user identities based solely on question phrasing.<n>We introduce Demographic Attribute Inference from Questions (DAIQ), a task and framework for auditing an overlooked failure mode in language models.
arXiv Detail & Related papers (2025-08-18T19:26:17Z) - SCDF: A Speaker Characteristics DeepFake Speech Dataset for Bias Analysis [1.2499537119440245]
Speaker Characteristics Deepfake dataset contains over 237,000 utterances in a balanced representation of both male and female speakers.<n>We show that speaker characteristics significantly influence detection performance, revealing disparities across sex, language, age, and synthesizer type.<n>These findings highlight the need for bias-aware development and provide a foundation for building non-discriminatory deepfake detection systems.
arXiv Detail & Related papers (2025-08-11T12:58:37Z) - Assessing the Reliability of LLMs Annotations in the Context of Demographic Bias and Model Explanation [5.907945985868999]
This study investigates the extent to which annotator demographic features influence labeling decisions compared to text content.<n>Using a Generalized Linear Mixed Model, we quantify this inf luence, finding that demographic factors account for a minor fraction ( 8%) of the observed variance.<n>We then assess the reliability of Generative AI (GenAI) models as annotators, specifically evaluating if guiding them with demographic personas improves alignment with human judgments.
arXiv Detail & Related papers (2025-07-17T14:00:13Z) - Fair Deepfake Detectors Can Generalize [51.21167546843708]
We show that controlling for confounders (data distribution and model capacity) enables improved generalization via fairness interventions.<n>Motivated by this insight, we propose Demographic Attribute-insensitive Intervention Detection (DAID), a plug-and-play framework composed of: i) Demographic-aware data rebalancing, which employs inverse-propensity weighting and subgroup-wise feature normalization to neutralize distributional biases; and ii) Demographic-agnostic feature aggregation, which uses a novel alignment loss to suppress sensitive-attribute signals.<n>DAID consistently achieves superior performance in both fairness and generalization compared to several state-of-the-art
arXiv Detail & Related papers (2025-07-03T14:10:02Z) - Balancing out Bias: Achieving Fairness Through Training Reweighting [58.201275105195485]
Bias in natural language processing arises from models learning characteristics of the author such as gender and race.
Existing methods for mitigating and measuring bias do not directly account for correlations between author demographics and linguistic variables.
This paper introduces a very simple but highly effective method for countering bias using instance reweighting.
arXiv Detail & Related papers (2021-09-16T23:40:28Z) - Balancing Biases and Preserving Privacy on Balanced Faces in the Wild [50.915684171879036]
There are demographic biases present in current facial recognition (FR) models.
We introduce our Balanced Faces in the Wild dataset to measure these biases across different ethnic and gender subgroups.
We find that relying on a single score threshold to differentiate between genuine and imposters sample pairs leads to suboptimal results.
We propose a novel domain adaptation learning scheme that uses facial features extracted from state-of-the-art neural networks.
arXiv Detail & Related papers (2021-03-16T15:05:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.