Classist Tools: Social Class Correlates with Performance in NLP
- URL: http://arxiv.org/abs/2403.04445v1
- Date: Thu, 7 Mar 2024 12:27:08 GMT
- Title: Classist Tools: Social Class Correlates with Performance in NLP
- Authors: Amanda Cercas Curry, Giuseppe Attanasio, Zeerak Talat and Dirk Hovy
- Abstract summary: sociodemographic characteristics are infrequently used in Natural Language Processing.
We show that NLP disadvantages less-privileged socioeconomic groups.
We argue for the inclusion of socioeconomic class in future language technologies.
- Score: 27.683676116781758
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Since the foundational work of William Labov on the social stratification of
language (Labov, 1964), linguistics has made concentrated efforts to explore
the links between sociodemographic characteristics and language production and
perception. But while there is strong evidence for socio-demographic
characteristics in language, they are infrequently used in Natural Language
Processing (NLP). Age and gender are somewhat well represented, but Labov's
original target, socioeconomic status, is noticeably absent. And yet it
matters. We show empirically that NLP disadvantages less-privileged
socioeconomic groups. We annotate a corpus of 95K utterances from movies with
social class, ethnicity and geographical language variety and measure the
performance of NLP systems on three tasks: language modelling, automatic speech
recognition, and grammar error correction. We find significant performance
disparities that can be attributed to socioeconomic status as well as ethnicity
and geographical differences. With NLP technologies becoming ever more
ubiquitous and quotidian, they must accommodate all language varieties to avoid
disadvantaging already marginalised groups. We argue for the inclusion of
socioeconomic class in future language technologies.
Related papers
- The AI Gap: How Socioeconomic Status Affects Language Technology Interactions [23.481043448238516]
Socioeconomic status (SES) fundamentally influences how people interact with each other and digital technologies like Large Language Models (LLMs)<n>We survey 1,000 individuals from diverse socioeconomic backgrounds about their use of language technologies and generative AI.<n>We find systematic differences across SES groups in language technology usage (i.e., frequency, performed tasks), interaction styles, and topics.
arXiv Detail & Related papers (2025-05-17T22:35:40Z) - The Call for Socially Aware Language Technologies [94.6762219597438]
We argue that many of these issues share a common core: a lack of awareness of the factors, context, and implications of the social environment in which NLP operates.
We argue that substantial challenges remain for NLP to develop social awareness and that we are just at the beginning of a new era for the field.
arXiv Detail & Related papers (2024-05-03T18:12:39Z) - Impoverished Language Technology: The Lack of (Social) Class in NLP [24.138711060814963]
Since Labov's (1964) foundational work on the social stratification of language, linguistics has dedicated concerted efforts towards understanding the relationships between socio-demographic factors and language production and perception.
Despite the large body of evidence identifying significant relationships between socio-demographic factors and language production, relatively few of these factors have been investigated in the context of NLP technology.
While age and gender are well covered, Labov's initial target, socio-economic class, is largely absent.
arXiv Detail & Related papers (2024-03-06T17:35:27Z) - What Do Dialect Speakers Want? A Survey of Attitudes Towards Language Technology for German Dialects [60.8361859783634]
We survey speakers of dialects and regional languages related to German.
We find that respondents are especially in favour of potential NLP tools that work with dialectal input.
arXiv Detail & Related papers (2024-02-19T09:15:28Z) - Natural Language Processing for Dialects of a Language: A Survey [56.93337350526933]
State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets.
This survey delves into an important attribute of these datasets: the dialect of a language.
Motivated by the performance degradation of NLP models for dialectic datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches.
arXiv Detail & Related papers (2024-01-11T03:04:38Z) - Quantifying the Dialect Gap and its Correlates Across Languages [69.18461982439031]
This work will lay the foundation for furthering the field of dialectal NLP by laying out evident disparities and identifying possible pathways for addressing them through mindful data collection.
arXiv Detail & Related papers (2023-10-23T17:42:01Z) - Sensitivity, Performance, Robustness: Deconstructing the Effect of
Sociodemographic Prompting [64.80538055623842]
sociodemographic prompting is a technique that steers the output of prompt-based models towards answers that humans with specific sociodemographic profiles would give.
We show that sociodemographic information affects model predictions and can be beneficial for improving zero-shot learning in subjective NLP tasks.
arXiv Detail & Related papers (2023-09-13T15:42:06Z) - When Dialects Collide: How Socioeconomic Mixing Affects Language Use [0.0]
We find that the more different socioeconomic classes mix, the less interdependent the frequency of their departures from standard grammar and their income become.
We propose an agent-based model of linguistic variety adoption that sheds light on the mechanisms that produce the observations seen in the data.
arXiv Detail & Related papers (2023-07-19T14:55:50Z) - On the Limitations of Sociodemographic Adaptation with Transformers [34.768337465321395]
Sociodemographic factors (e.g., gender or age) shape our language.
Previous work showed that incorporating specific sociodemographic factors can consistently improve performance for various NLP tasks.
We use three common specialization methods proven effective for incorporating external knowledge into pretrained Transformers.
arXiv Detail & Related papers (2022-08-01T17:58:02Z) - Towards a Deep Multi-layered Dialectal Language Analysis: A Case Study
of African-American English [0.20305676256390934]
Part-of-speech taggers trained on Mainstream American English (MAE) produce non-interpretable results when applied to African American English (AAE)
In this work, we incorporate a human-in-the-loop paradigm to gain a better understanding of AAE speakers' behavior and their language use.
arXiv Detail & Related papers (2022-06-03T01:05:58Z) - Mapping the Multilingual Margins: Intersectional Biases of Sentiment
Analysis Systems in English, Spanish, and Arabic [3.3458760961317635]
We introduce four multilingual Equity Evaluation Corpora, supplementary test sets designed to measure social biases, and a novel statistical framework for studying unisectional and intersectional social biases in natural language processing.
We use these tools to measure gender, racial, ethnic, and intersectional social biases across five models trained on emotion regression tasks in English, Spanish, and Arabic.
arXiv Detail & Related papers (2022-04-07T16:33:15Z) - Systematic Inequalities in Language Technology Performance across the
World's Languages [94.65681336393425]
We introduce a framework for estimating the global utility of language technologies.
Our analyses involve the field at large, but also more in-depth studies on both user-facing technologies and more linguistic NLP tasks.
arXiv Detail & Related papers (2021-10-13T14:03:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.