Machines Do See Color: A Guideline to Classify Different Forms of Racist
Discourse in Large Corpora
- URL: http://arxiv.org/abs/2401.09333v2
- Date: Sat, 20 Jan 2024 15:01:01 GMT
- Title: Machines Do See Color: A Guideline to Classify Different Forms of Racist
Discourse in Large Corpora
- Authors: Diana Davila Gordillo, Joan Timoneda, Sebastian Vallejo Vera
- Abstract summary: Current methods to identify and classify racist language in text rely on small-n qualitative approaches or large-n approaches focusing exclusively on overt forms of racist discourse.
This article provides a step-by-step generalizable guideline to identify and classify different forms of racist discourse in large corpora.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Current methods to identify and classify racist language in text rely on
small-n qualitative approaches or large-n approaches focusing exclusively on
overt forms of racist discourse. This article provides a step-by-step
generalizable guideline to identify and classify different forms of racist
discourse in large corpora. In our approach, we start by conceptualizing racism
and its different manifestations. We then contextualize these racist
manifestations to the time and place of interest, which allows researchers to
identify their discursive form. Finally, we apply XLM-RoBERTa (XLM-R), a
cross-lingual model for supervised text classification with a cutting-edge
contextual understanding of text. We show that XLM-R and XLM-R-Racismo, our
pretrained model, outperform other state-of-the-art approaches in classifying
racism in large corpora. We illustrate our approach using a corpus of tweets
relating to the Ecuadorian ind\'igena community between 2018 and 2021.
Related papers
- Large Language Models Reflect the Ideology of their Creators [73.25935570218375]
Large language models (LLMs) are trained on vast amounts of data to generate natural language.
We uncover notable diversity in the ideological stance exhibited across different LLMs and languages.
arXiv Detail & Related papers (2024-10-24T04:02:30Z) - Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models [50.40276881893513]
This study introduces Spoken Stereoset, a dataset specifically designed to evaluate social biases in Speech Large Language Models (SLLMs)
By examining how different models respond to speech from diverse demographic groups, we aim to identify these biases.
The findings indicate that while most models show minimal bias, some still exhibit slightly stereotypical or anti-stereotypical tendencies.
arXiv Detail & Related papers (2024-08-14T16:55:06Z) - White Men Lead, Black Women Help? Benchmarking Language Agency Social Biases in LLMs [58.27353205269664]
Social biases can manifest in language agency.
We introduce the novel Language Agency Bias Evaluation benchmark.
We unveil language agency social biases in 3 recent Large Language Model (LLM)-generated content.
arXiv Detail & Related papers (2024-04-16T12:27:54Z) - Dialect prejudice predicts AI decisions about people's character,
employability, and criminality [36.448157493217344]
We show that language models embody covert racism in the form of dialect prejudice.
Our findings have far-reaching implications for the fair and safe employment of language technology.
arXiv Detail & Related papers (2024-03-01T18:43:09Z) - Pixel Sentence Representation Learning [67.4775296225521]
In this work, we conceptualize the learning of sentence-level textual semantics as a visual representation learning process.
We employ visually-grounded text perturbation methods like typos and word order shuffling, resonating with human cognitive patterns, and enabling perturbation to be perceived as continuous.
Our approach is further bolstered by large-scale unsupervised topical alignment training and natural language inference supervision.
arXiv Detail & Related papers (2024-02-13T02:46:45Z) - Marked Personas: Using Natural Language Prompts to Measure Stereotypes
in Language Models [33.157279170602784]
We present Marked Personas, a prompt-based method to measure stereotypes in large language models (LLMs)
We find that portrayals generated by GPT-3.5 and GPT-4 contain higher rates of racial stereotypes than human-written portrayals using the same prompts.
An intersectional lens reveals tropes that dominate portrayals of marginalized groups, such as tropicalism and the hypersexualization of minoritized women.
arXiv Detail & Related papers (2023-05-29T16:29:22Z) - Easily Accessible Text-to-Image Generation Amplifies Demographic
Stereotypes at Large Scale [61.555788332182395]
We investigate the potential for machine learning models to amplify dangerous and complex stereotypes.
We find a broad range of ordinary prompts produce stereotypes, including prompts simply mentioning traits, descriptors, occupations, or objects.
arXiv Detail & Related papers (2022-11-07T18:31:07Z) - Mitigating Racial Biases in Toxic Language Detection with an
Equity-Based Ensemble Framework [9.84413545378636]
Recent research has demonstrated how racial biases against users who write African American English exist in popular toxic language datasets.
We propose additional descriptive fairness metrics to better understand the source of these biases.
We show that our proposed framework substantially reduces the racial biases that the model learns from these datasets.
arXiv Detail & Related papers (2021-09-27T15:54:05Z) - Whose Opinions Matter? Perspective-aware Models to Identify Opinions of
Hate Speech Victims in Abusive Language Detection [6.167830237917662]
We present an in-depth study to model polarized opinions coming from different communities.
We believe that by relying on this information, we can divide the annotators into groups sharing similar perspectives.
We propose a novel resource, a multi-perspective English language dataset annotated according to different sub-categories relevant for characterising online abuse.
arXiv Detail & Related papers (2021-06-30T08:35:49Z) - Towards Understanding and Mitigating Social Biases in Language Models [107.82654101403264]
Large-scale pretrained language models (LMs) can be potentially dangerous in manifesting undesirable representational biases.
We propose steps towards mitigating social biases during text generation.
Our empirical results and human evaluation demonstrate effectiveness in mitigating bias while retaining crucial contextual information.
arXiv Detail & Related papers (2021-06-24T17:52:43Z) - Examining Racial Bias in an Online Abuse Corpus with Structural Topic
Modeling [0.30458514384586405]
We use structural topic modeling to examine racial bias in social media posts.
We augment the abusive language dataset by adding an additional feature indicating the predicted probability of the tweet being written in African-American English.
arXiv Detail & Related papers (2020-05-26T21:02:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.