Measuring Geographic Performance Disparities of Offensive Language
Classifiers
- URL: http://arxiv.org/abs/2209.07353v1
- Date: Thu, 15 Sep 2022 15:08:18 GMT
- Title: Measuring Geographic Performance Disparities of Offensive Language
Classifiers
- Authors: Brandon Lwowski, Paul Rad, Anthony Rios
- Abstract summary: We ask two questions: Does language, dialect, and topical content vary across geographical regions?'' and If there are differences across the regions, do they impact model performance?''
We find that current models do not generalize across locations. Likewise, we show that while offensive language models produce false positives on African American English, model performance is not correlated with each city's minority population proportions.
- Score: 12.545108947857802
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text classifiers are applied at scale in the form of one-size-fits-all
solutions. Nevertheless, many studies show that classifiers are biased
regarding different languages and dialects. When measuring and discovering
these biases, some gaps present themselves and should be addressed. First,
``Does language, dialect, and topical content vary across geographical
regions?'' and secondly ``If there are differences across the regions, do they
impact model performance?''. We introduce a novel dataset called GeoOLID with
more than 14 thousand examples across 15 geographically and demographically
diverse cities to address these questions. We perform a comprehensive analysis
of geographical-related content and their impact on performance disparities of
offensive language detection models. Overall, we find that current models do
not generalize across locations. Likewise, we show that while offensive
language models produce false positives on African American English, model
performance is not correlated with each city's minority population proportions.
Warning: This paper contains offensive language.
Related papers
- Dialetto, ma Quanto Dialetto? Transcribing and Evaluating Dialects on a Continuum [25.732397636695882]
We measure speech-to-text performance on Italian dialects, and empirically observe a geographical performance disparity.
This disparity correlates substantially (-0.5) with linguistic similarity to the highest performing dialect variety.
We additionally leverage geostatistical methods to predict zero-shot performance at unseen sites, and find the incorporation of geographical information to substantially improve prediction performance.
arXiv Detail & Related papers (2024-10-18T16:39:42Z) - Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models [50.40276881893513]
This study introduces Spoken Stereoset, a dataset specifically designed to evaluate social biases in Speech Large Language Models (SLLMs)
By examining how different models respond to speech from diverse demographic groups, we aim to identify these biases.
The findings indicate that while most models show minimal bias, some still exhibit slightly stereotypical or anti-stereotypical tendencies.
arXiv Detail & Related papers (2024-08-14T16:55:06Z) - Evaluation of Geographical Distortions in Language Models: A Crucial Step Towards Equitable Representations [2.825324306665133]
This study focuses on biases related to geographical knowledge.
We explore the connection between geography and language models by highlighting their tendency to misrepresent spatial information.
arXiv Detail & Related papers (2024-04-26T13:22:28Z) - Language Models for Text Classification: Is In-Context Learning Enough? [54.869097980761595]
Recent foundational language models have shown state-of-the-art performance in many NLP tasks in zero- and few-shot settings.
An advantage of these models over more standard approaches is the ability to understand instructions written in natural language (prompts)
This makes them suitable for addressing text classification problems for domains with limited amounts of annotated instances.
arXiv Detail & Related papers (2024-03-26T12:47:39Z) - On the Scaling Laws of Geographical Representation in Language Models [0.11510009152620666]
We show that geographical knowledge is observable even for tiny models, and that it scales consistently as we increase the model size.
Notably, we observe that larger language models cannot mitigate the geographical bias that is inherent to the training data.
arXiv Detail & Related papers (2024-02-29T18:04:11Z) - Multilingual Text-to-Image Generation Magnifies Gender Stereotypes and Prompt Engineering May Not Help You [64.74707085021858]
We show that multilingual models suffer from significant gender biases just as monolingual models do.
We propose a novel benchmark, MAGBIG, intended to foster research on gender bias in multilingual models.
Our results show that not only do models exhibit strong gender biases but they also behave differently across languages.
arXiv Detail & Related papers (2024-01-29T12:02:28Z) - Global Voices, Local Biases: Socio-Cultural Prejudices across Languages [22.92083941222383]
Human biases are ubiquitous but not uniform; disparities exist across linguistic, cultural, and societal borders.
In this work, we scale the Word Embedding Association Test (WEAT) to 24 languages, enabling broader studies.
To encompass more widely prevalent societal biases, we examine new bias dimensions across toxicity, ableism, and more.
arXiv Detail & Related papers (2023-10-26T17:07:50Z) - Quantifying the Dialect Gap and its Correlates Across Languages [69.18461982439031]
This work will lay the foundation for furthering the field of dialectal NLP by laying out evident disparities and identifying possible pathways for addressing them through mindful data collection.
arXiv Detail & Related papers (2023-10-23T17:42:01Z) - Exploring Anisotropy and Outliers in Multilingual Language Models for
Cross-Lingual Semantic Sentence Similarity [64.18762301574954]
Previous work has shown that the representations output by contextual language models are more anisotropic than static type embeddings.
This seems to be true for both monolingual and multilingual models, although much less work has been done on the multilingual context.
We investigate outlier dimensions and their relationship to anisotropy in multiple pre-trained multilingual language models.
arXiv Detail & Related papers (2023-06-01T09:01:48Z) - Comparing Biases and the Impact of Multilingual Training across Multiple
Languages [70.84047257764405]
We present a bias analysis across Italian, Chinese, English, Hebrew, and Spanish on the downstream sentiment analysis task.
We adapt existing sentiment bias templates in English to Italian, Chinese, Hebrew, and Spanish for four attributes: race, religion, nationality, and gender.
Our results reveal similarities in bias expression such as favoritism of groups that are dominant in each language's culture.
arXiv Detail & Related papers (2023-05-18T18:15:07Z) - Geolocation differences of language use in urban areas [0.0]
We explore the use of Twitter data with precise geolocation information to resolve spatial variations in language use on an urban scale down to single city blocks.
Our work shows that analysis of small-scale variations can provide unique information on correlations between language use and social context.
arXiv Detail & Related papers (2021-08-01T19:55:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.