Impoverished Language Technology: The Lack of (Social) Class in NLP
- URL: http://arxiv.org/abs/2403.03874v1
- Date: Wed, 6 Mar 2024 17:35:27 GMT
- Title: Impoverished Language Technology: The Lack of (Social) Class in NLP
- Authors: Amanda Cercas Curry, Zeerak Talat, Dirk Hovy
- Abstract summary: Since Labov's (1964) foundational work on the social stratification of language, linguistics has dedicated concerted efforts towards understanding the relationships between socio-demographic factors and language production and perception.
Despite the large body of evidence identifying significant relationships between socio-demographic factors and language production, relatively few of these factors have been investigated in the context of NLP technology.
While age and gender are well covered, Labov's initial target, socio-economic class, is largely absent.
- Score: 24.138711060814963
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Since Labov's (1964) foundational work on the social stratification of
language, linguistics has dedicated concerted efforts towards understanding the
relationships between socio-demographic factors and language production and
perception. Despite the large body of evidence identifying significant
relationships between socio-demographic factors and language production,
relatively few of these factors have been investigated in the context of NLP
technology. While age and gender are well covered, Labov's initial target,
socio-economic class, is largely absent. We survey the existing Natural
Language Processing (NLP) literature and find that only 20 papers even mention
socio-economic status. However, the majority of those papers do not engage with
class beyond collecting information of annotator-demographics. Given this
research lacuna, we provide a definition of class that can be operationalised
by NLP researchers, and argue for including socio-economic class in future
language technologies.
Related papers
- Bridging Gaps in Natural Language Processing for Yorùbá: A Systematic Review of a Decade of Progress and Prospects [0.6554326244334868]
This review highlights the scarcity of annotated corpora, limited availability of pre-trained language models, and linguistic challenges like tonal complexity and diacritic dependency as significant obstacles.
The findings reveal a growing body of multilingual and monolingual resources, even though the field is constrained by socio-cultural factors such as code-switching and desertion of language for digital usage.
arXiv Detail & Related papers (2025-02-24T17:41:48Z) - The Nature of NLP: Analyzing Contributions in NLP Papers [77.31665252336157]
We quantitatively investigate what constitutes NLP research by examining research papers.
Our findings reveal a rising involvement of machine learning in NLP since the early nineties.
In post-2020, there has been a resurgence of focus on language and people.
arXiv Detail & Related papers (2024-09-29T01:29:28Z) - The Call for Socially Aware Language Technologies [94.6762219597438]
We argue that many of these issues share a common core: a lack of awareness of the factors, context, and implications of the social environment in which NLP operates.
We argue that substantial challenges remain for NLP to develop social awareness and that we are just at the beginning of a new era for the field.
arXiv Detail & Related papers (2024-05-03T18:12:39Z) - PhonologyBench: Evaluating Phonological Skills of Large Language Models [57.80997670335227]
Phonology, the study of speech's structure and pronunciation rules, is a critical yet often overlooked component in Large Language Model (LLM) research.
We present PhonologyBench, a novel benchmark consisting of three diagnostic tasks designed to explicitly test the phonological skills of LLMs.
We observe a significant gap of 17% and 45% on Rhyme Word Generation and Syllable counting, respectively, when compared to humans.
arXiv Detail & Related papers (2024-04-03T04:53:14Z) - Classist Tools: Social Class Correlates with Performance in NLP [27.683676116781758]
sociodemographic characteristics are infrequently used in Natural Language Processing.
We show that NLP disadvantages less-privileged socioeconomic groups.
We argue for the inclusion of socioeconomic class in future language technologies.
arXiv Detail & Related papers (2024-03-07T12:27:08Z) - Natural Language Processing for Dialects of a Language: A Survey [56.93337350526933]
State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets.
This survey delves into an important attribute of these datasets: the dialect of a language.
Motivated by the performance degradation of NLP models for dialectic datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches.
arXiv Detail & Related papers (2024-01-11T03:04:38Z) - Quantifying the Dialect Gap and its Correlates Across Languages [69.18461982439031]
This work will lay the foundation for furthering the field of dialectal NLP by laying out evident disparities and identifying possible pathways for addressing them through mindful data collection.
arXiv Detail & Related papers (2023-10-23T17:42:01Z) - Sensitivity, Performance, Robustness: Deconstructing the Effect of
Sociodemographic Prompting [64.80538055623842]
sociodemographic prompting is a technique that steers the output of prompt-based models towards answers that humans with specific sociodemographic profiles would give.
We show that sociodemographic information affects model predictions and can be beneficial for improving zero-shot learning in subjective NLP tasks.
arXiv Detail & Related papers (2023-09-13T15:42:06Z) - On the Limitations of Sociodemographic Adaptation with Transformers [34.768337465321395]
Sociodemographic factors (e.g., gender or age) shape our language.
Previous work showed that incorporating specific sociodemographic factors can consistently improve performance for various NLP tasks.
We use three common specialization methods proven effective for incorporating external knowledge into pretrained Transformers.
arXiv Detail & Related papers (2022-08-01T17:58:02Z) - Systematic Inequalities in Language Technology Performance across the
World's Languages [94.65681336393425]
We introduce a framework for estimating the global utility of language technologies.
Our analyses involve the field at large, but also more in-depth studies on both user-facing technologies and more linguistic NLP tasks.
arXiv Detail & Related papers (2021-10-13T14:03:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.