Language technology practitioners as language managers: arbitrating data
bias and predictive bias in ASR
- URL: http://arxiv.org/abs/2202.12603v1
- Date: Fri, 25 Feb 2022 10:37:52 GMT
- Title: Language technology practitioners as language managers: arbitrating data
bias and predictive bias in ASR
- Authors: Nina Markl and Stephen Joseph McNulty
- Abstract summary: We use the lens of language policy to analyse how current practices in training and testing ASR systems in industry lead to the data bias giving rise to these systematic error differences.
We propose a re-framing of language resources as (public) infrastructure which should not solely be designed for markets, but for, and with meaningful cooperation of, speech communities.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the fact that variation is a fundamental characteristic of natural
language, automatic speech recognition systems perform systematically worse on
non-standardised and marginalised language varieties. In this paper we use the
lens of language policy to analyse how current practices in training and
testing ASR systems in industry lead to the data bias giving rise to these
systematic error differences. We believe that this is a useful perspective for
speech and language technology practitioners to understand the origins and
harms of algorithmic bias, and how they can mitigate it. We also propose a
re-framing of language resources as (public) infrastructure which should not
solely be designed for markets, but for, and with meaningful cooperation of,
speech communities.
Related papers
- Collapsed Language Models Promote Fairness [88.48232731113306]
We find that debiased language models exhibit collapsed alignment between token representations and word embeddings.
We design a principled fine-tuning method that can effectively improve fairness in a wide range of debiasing methods.
arXiv Detail & Related papers (2024-10-06T13:09:48Z) - Quantifying the Dialect Gap and its Correlates Across Languages [69.18461982439031]
This work will lay the foundation for furthering the field of dialectal NLP by laying out evident disparities and identifying possible pathways for addressing them through mindful data collection.
arXiv Detail & Related papers (2023-10-23T17:42:01Z) - Uncertainty in Natural Language Generation: From Theory to Applications [42.55924708592451]
We argue that a principled treatment of uncertainty can assist in creating systems and evaluation protocols better aligned with these goals.
We first present the fundamental theory, frameworks and vocabulary required to represent uncertainty.
We then propose a two-dimensional taxonomy that is more informative and faithful than the popular aleatoric/epistemic dichotomy.
arXiv Detail & Related papers (2023-07-28T17:51:21Z) - Towards Bridging the Digital Language Divide [4.234367850767171]
multilingual language processing systems often exhibit a hardwired, yet usually involuntary and hidden representational preference towards certain languages.
We show that biased technology is often the result of research and development methodologies that do not do justice to the complexity of the languages being represented.
We present a new initiative that aims at reducing linguistic bias through both technological design and methodology.
arXiv Detail & Related papers (2023-07-25T10:53:20Z) - A methodology to characterize bias and harmful stereotypes in natural
language processing in Latin America [2.05094736006609]
We show how social scientists, domain experts, and machine learning experts can collaboratively explore biases and harmful stereotypes in word embeddings and large language models.
Our methodology is based on the following principles.
arXiv Detail & Related papers (2022-07-14T01:07:55Z) - Color Overmodification Emerges from Data-Driven Learning and Pragmatic
Reasoning [53.088796874029974]
We show that speakers' referential expressions depart from communicative ideals in ways that help illuminate the nature of pragmatic language use.
By adopting neural networks as learning agents, we show that overmodification is more likely with environmental features that are infrequent or salient.
arXiv Detail & Related papers (2022-05-18T18:42:43Z) - Pre-Trained Language Models for Interactive Decision-Making [72.77825666035203]
We describe a framework for imitation learning in which goals and observations are represented as a sequence of embeddings.
We demonstrate that this framework enables effective generalization across different environments.
For test tasks involving novel goals or novel scenes, initializing policies with language models improves task completion rates by 43.6%.
arXiv Detail & Related papers (2022-02-03T18:55:52Z) - Towards Understanding and Mitigating Social Biases in Language Models [107.82654101403264]
Large-scale pretrained language models (LMs) can be potentially dangerous in manifesting undesirable representational biases.
We propose steps towards mitigating social biases during text generation.
Our empirical results and human evaluation demonstrate effectiveness in mitigating bias while retaining crucial contextual information.
arXiv Detail & Related papers (2021-06-24T17:52:43Z) - On the Language Coverage Bias for Neural Machine Translation [81.81456880770762]
Language coverage bias is important for neural machine translation (NMT) because the target-original training data is not well exploited in current practice.
By carefully designing experiments, we provide comprehensive analyses of the language coverage bias in the training data.
We propose two simple and effective approaches to alleviate the language coverage bias problem.
arXiv Detail & Related papers (2021-06-07T01:55:34Z) - Quantifying Bias in Automatic Speech Recognition [28.301997555189462]
This paper quantifies the bias of a Dutch SotA ASR system against gender, age, regional accents and non-native accents.
Based on our findings, we suggest bias mitigation strategies for ASR development.
arXiv Detail & Related papers (2021-03-28T12:52:03Z) - Learning not to Discriminate: Task Agnostic Learning for Improving
Monolingual and Code-switched Speech Recognition [12.354292498112347]
We present further improvements over our previous work by using domain adversarial learning to train task models.
Our proposed technique leads to reductions in Word Error Rates (WER) in monolingual and code-switched test sets across three language pairs.
arXiv Detail & Related papers (2020-06-09T13:45:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.