Pre-trained Speech Processing Models Contain Human-Like Biases that
Propagate to Speech Emotion Recognition
- URL: http://arxiv.org/abs/2310.18877v1
- Date: Sun, 29 Oct 2023 02:27:56 GMT
- Title: Pre-trained Speech Processing Models Contain Human-Like Biases that
Propagate to Speech Emotion Recognition
- Authors: Isaac Slaughter, Craig Greenberg, Reva Schwartz, Aylin Caliskan
- Abstract summary: We present the Speech Embedding Association Test (SpEAT), a method for detecting bias in one type of model used for many speech tasks: pre-trained models.
Using the SpEAT, we test for six types of bias in 16 English speech models.
Our work provides evidence that, like text and image-based models, pre-trained speech based-models frequently learn human-like biases.
- Score: 4.4212441764241
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Previous work has established that a person's demographics and speech style
affect how well speech processing models perform for them. But where does this
bias come from? In this work, we present the Speech Embedding Association Test
(SpEAT), a method for detecting bias in one type of model used for many speech
tasks: pre-trained models. The SpEAT is inspired by word embedding association
tests in natural language processing, which quantify intrinsic bias in a
model's representations of different concepts, such as race or valence
(something's pleasantness or unpleasantness) and capture the extent to which a
model trained on large-scale socio-cultural data has learned human-like biases.
Using the SpEAT, we test for six types of bias in 16 English speech models
(including 4 models also trained on multilingual data), which come from the
wav2vec 2.0, HuBERT, WavLM, and Whisper model families. We find that 14 or more
models reveal positive valence (pleasantness) associations with abled people
over disabled people, with European-Americans over African-Americans, with
females over males, with U.S. accented speakers over non-U.S. accented
speakers, and with younger people over older people. Beyond establishing that
pre-trained speech models contain these biases, we also show that they can have
real world effects. We compare biases found in pre-trained models to biases in
downstream models adapted to the task of Speech Emotion Recognition (SER) and
find that in 66 of the 96 tests performed (69%), the group that is more
associated with positive valence as indicated by the SpEAT also tends to be
predicted as speaking with higher valence by the downstream model. Our work
provides evidence that, like text and image-based models, pre-trained speech
based-models frequently learn human-like biases. Our work also shows that bias
found in pre-trained models can propagate to the downstream task of SER.
Related papers
- Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models [50.40276881893513]
This study introduces Spoken Stereoset, a dataset specifically designed to evaluate social biases in Speech Large Language Models (SLLMs)
By examining how different models respond to speech from diverse demographic groups, we aim to identify these biases.
The findings indicate that while most models show minimal bias, some still exhibit slightly stereotypical or anti-stereotypical tendencies.
arXiv Detail & Related papers (2024-08-14T16:55:06Z) - The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Pre-trained Language Models [78.69526166193236]
Pre-trained Language models (PLMs) have been acknowledged to contain harmful information, such as social biases.
We propose sc Social Bias Neurons to accurately pinpoint units (i.e., neurons) in a language model that can be attributed to undesirable behavior, such as social bias.
As measured by prior metrics from StereoSet, our model achieves a higher degree of fairness while maintaining language modeling ability with low cost.
arXiv Detail & Related papers (2024-06-14T15:41:06Z) - SpeechAlign: Aligning Speech Generation to Human Preferences [51.684183257809075]
We introduce SpeechAlign, an iterative self-improvement strategy that aligns speech language models to human preferences.
We show that SpeechAlign can bridge the distribution gap and facilitate continuous self-improvement of the speech language model.
arXiv Detail & Related papers (2024-04-08T15:21:17Z) - Detecting Bias in Large Language Models: Fine-tuned KcBERT [0.0]
We define such harm as societal bias and assess ethnic, gender, and racial biases in a model fine-tuned with Korean comments.
Our contribution lies in demonstrating that societal bias exists in Korean language models due to language-dependent characteristics.
arXiv Detail & Related papers (2024-03-16T02:27:19Z) - Evaluating Biased Attitude Associations of Language Models in an
Intersectional Context [2.891314299138311]
Language models are trained on large-scale corpora that embed implicit biases documented in psychology.
We study biases related to age, education, gender, height, intelligence, literacy, race, religion, sex, sexual orientation, social class, and weight.
We find that language models exhibit the most biased attitudes against gender identity, social class, and sexual orientation signals in language.
arXiv Detail & Related papers (2023-07-07T03:01:56Z) - Exposing Bias in Online Communities through Large-Scale Language Models [3.04585143845864]
This work uses the flaw of bias in language models to explore the biases of six different online communities.
The bias of the resulting models is evaluated by prompting the models with different demographics and comparing the sentiment and toxicity values of these generations.
This work not only affirms how easily bias is absorbed from training data but also presents a scalable method to identify and compare the bias of different datasets or communities.
arXiv Detail & Related papers (2023-06-04T08:09:26Z) - Textually Pretrained Speech Language Models [107.10344535390956]
We propose TWIST, a method for training SpeechLMs using a warm-start from a pretrained textual language models.
We show using both automatic and human evaluations that TWIST outperforms a cold-start SpeechLM across the board.
arXiv Detail & Related papers (2023-05-22T13:12:16Z) - M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for
Multilingual Speech to Image Retrieval [56.49878599920353]
This work investigates the use of large-scale, English-only pre-trained models (CLIP and HuBERT) for multilingual image-speech retrieval.
For non-English image-speech retrieval, we outperform the current state-of-the-art performance by a wide margin both when training separate models for each language, and with a single model which processes speech in all three languages.
arXiv Detail & Related papers (2022-11-02T14:54:45Z) - Do self-supervised speech models develop human-like perception biases? [11.646802225841153]
We examine the representational spaces of three kinds of state-of-the-art self-supervised models: wav2vec 2.0, HuBERT and contrastive predictive coding ( CPC)
We show that the CPC model shows a small native language effect, but that wav2vec 2.0 and HuBERT seem to develop a universal speech perception space which is not language specific.
A comparison against the predictions of supervised phone recognisers suggests that all three self-supervised models capture relatively fine-grained perceptual phenomena, while supervised models are better at capturing coarser, phone-level, effects of listeners' native language, on perception.
arXiv Detail & Related papers (2022-05-31T14:21:40Z) - Unsupervised Cross-lingual Representation Learning for Speech
Recognition [63.85924123692923]
XLSR learns cross-lingual speech representations by pretraining a single model from the raw waveform of speech in multiple languages.
We build on wav2vec 2.0 which is trained by solving a contrastive task over masked latent speech representations.
Experiments show that cross-lingual pretraining significantly outperforms monolingual pretraining.
arXiv Detail & Related papers (2020-06-24T18:25:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.