Related papers: The Balancing Act: Unmasking and Alleviating ASR Biases in Portuguese

The Balancing Act: Unmasking and Alleviating ASR Biases in Portuguese

URL: http://arxiv.org/abs/2402.07513v1
Date: Mon, 12 Feb 2024 09:35:13 GMT
Title: The Balancing Act: Unmasking and Alleviating ASR Biases in Portuguese
Authors: Ajinkya Kulkarni, Anna Tokareva, Rameez Qureshi, Miguel Couceiro
Abstract summary: This study is dedicated to a comprehensive exploration of the Whisper and MMS systems. Our investigation encompasses various categories, including gender, age, skin tone color, and geo-location. We empirically show that oversampling techniques alleviate such stereotypical biases.
Score: 5.308321515594125
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: In the field of spoken language understanding, systems like Whisper and Multilingual Massive Speech (MMS) have shown state-of-the-art performances. This study is dedicated to a comprehensive exploration of the Whisper and MMS systems, with a focus on assessing biases in automatic speech recognition (ASR) inherent to casual conversation speech specific to the Portuguese language. Our investigation encompasses various categories, including gender, age, skin tone color, and geo-location. Alongside traditional ASR evaluation metrics such as Word Error Rate (WER), we have incorporated p-value statistical significance for gender bias analysis. Furthermore, we extensively examine the impact of data distribution and empirically show that oversampling techniques alleviate such stereotypical biases. This research represents a pioneering effort in quantifying biases in the Portuguese language context through the application of MMS and Whisper, contributing to a better understanding of ASR systems' performance in multilingual settings.

Related papers

I Think, Therefore I Am Under-Qualified? A Benchmark for Evaluating Linguistic Shibboleth Detection in LLM Hiring Evaluations [9.275967682881944]
This paper introduces a comprehensive benchmark for evaluating how Large Language Models respond to linguistic shibboleths.<n>We demonstrate how LLMs systematically penalize certain linguistic patterns, particularly hedging language, despite equivalent content quality.<n>We validate our approach along multiple linguistic dimensions, showing that hedged responses receive 25.6% lower ratings on average.
arXiv Detail & Related papers (2025-08-06T23:51:03Z)
Enhancing Multilingual ASR for Unseen Languages via Language Embedding Modeling [50.62091603179394]
Whisper, one of the most advanced ASR models, handles 99 languages effectively. However, Whisper struggles with unseen languages, those not included in its pre-training. We propose methods that exploit these relationships to enhance ASR performance on unseen languages.
arXiv Detail & Related papers (2024-12-21T04:05:43Z)
Everyone deserves their voice to be heard: Analyzing Predictive Gender Bias in ASR Models Applied to Dutch Speech Data [13.91630413828167]
This study focuses on identifying the performance disparities of Whisper models on Dutch speech data. We analyzed the word error rate, character error rate and a BERT-based semantic similarity across gender groups.
arXiv Detail & Related papers (2024-11-14T13:29:09Z)
Advocating Character Error Rate for Multilingual ASR Evaluation [1.2597747768235845]
We document the limitations of the word error rate (WER) as an evaluation metric and advocate for the character error rate (CER) as the primary metric. We show that CER avoids many of the challenges WER faces and exhibits greater consistency across writing systems. Our findings suggest that CER should be prioritized, or at least supplemented, in multilingual ASR evaluations to account for the varying linguistic characteristics of different languages.
arXiv Detail & Related papers (2024-10-09T19:57:07Z)
The Lou Dataset -- Exploring the Impact of Gender-Fair Language in German Text Classification [57.06913662622832]
Gender-fair language fosters inclusion by addressing all genders or using neutral forms. Gender-fair language substantially impacts predictions by flipping labels, reducing certainty, and altering attention patterns. While we offer initial insights on the effect on German text classification, the findings likely apply to other languages.
arXiv Detail & Related papers (2024-09-26T15:08:17Z)
Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models [50.40276881893513]
This study introduces Spoken Stereoset, a dataset specifically designed to evaluate social biases in Speech Large Language Models (SLLMs) By examining how different models respond to speech from diverse demographic groups, we aim to identify these biases. The findings indicate that while most models show minimal bias, some still exhibit slightly stereotypical or anti-stereotypical tendencies.
arXiv Detail & Related papers (2024-08-14T16:55:06Z)
Listen and Speak Fairly: A Study on Semantic Gender Bias in Speech Integrated Large Language Models [38.64792118903994]
We evaluate gender bias in SILLMs across four semantic-related tasks. Our analysis reveals that bias levels are language-dependent and vary with different evaluation methods.
arXiv Detail & Related papers (2024-07-09T15:35:43Z)
An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios [76.11409260727459]
This paper explores the language adaptation capability of ZMM-TTS, a recent SSL-based multilingual TTS system. We demonstrate that the similarity in phonetics between the pre-training and target languages, as well as the language category, affects the target language's adaptation performance.
arXiv Detail & Related papers (2024-06-13T08:16:52Z)
Quantifying the Dialect Gap and its Correlates Across Languages [69.18461982439031]
This work will lay the foundation for furthering the field of dialectal NLP by laying out evident disparities and identifying possible pathways for addressing them through mindful data collection.
arXiv Detail & Related papers (2023-10-23T17:42:01Z)
Language Dependencies in Adversarial Attacks on Speech Recognition Systems [0.0]
We compare the attackability of a German and an English ASR system. We investigate if one of the language models is more susceptible to manipulations than the other.
arXiv Detail & Related papers (2022-02-01T13:27:40Z)
Quantifying Bias in Automatic Speech Recognition [28.301997555189462]
This paper quantifies the bias of a Dutch SotA ASR system against gender, age, regional accents and non-native accents. Based on our findings, we suggest bias mitigation strategies for ASR development.
arXiv Detail & Related papers (2021-03-28T12:52:03Z)
Gender Stereotype Reinforcement: Measuring the Gender Bias Conveyed by Ranking Algorithms [68.85295025020942]
We propose the Gender Stereotype Reinforcement (GSR) measure, which quantifies the tendency of a Search Engines to support gender stereotypes. GSR is the first specifically tailored measure for Information Retrieval, capable of quantifying representational harms.
arXiv Detail & Related papers (2020-09-02T20:45:04Z)
Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer [101.58431011820755]
We study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications. We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations.
arXiv Detail & Related papers (2020-05-02T04:34:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.