Examining Racial Bias in an Online Abuse Corpus with Structural Topic
Modeling
- URL: http://arxiv.org/abs/2005.13041v1
- Date: Tue, 26 May 2020 21:02:43 GMT
- Title: Examining Racial Bias in an Online Abuse Corpus with Structural Topic
Modeling
- Authors: Thomas Davidson and Debasmita Bhattacharya
- Abstract summary: We use structural topic modeling to examine racial bias in social media posts.
We augment the abusive language dataset by adding an additional feature indicating the predicted probability of the tweet being written in African-American English.
- Score: 0.30458514384586405
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We use structural topic modeling to examine racial bias in data collected to
train models to detect hate speech and abusive language in social media posts.
We augment the abusive language dataset by adding an additional feature
indicating the predicted probability of the tweet being written in
African-American English. We then use structural topic modeling to examine the
content of the tweets and how the prevalence of different topics is related to
both abusiveness annotation and dialect prediction. We find that certain topics
are disproportionately racialized and considered abusive. We discuss how topic
modeling may be a useful approach for identifying bias in annotated data.
Related papers
- Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models [50.40276881893513]
This study introduces Spoken Stereoset, a dataset specifically designed to evaluate social biases in Speech Large Language Models (SLLMs)
By examining how different models respond to speech from diverse demographic groups, we aim to identify these biases.
The findings indicate that while most models show minimal bias, some still exhibit slightly stereotypical or anti-stereotypical tendencies.
arXiv Detail & Related papers (2024-08-14T16:55:06Z) - Does Liking Yellow Imply Driving a School Bus? Semantic Leakage in Language Models [113.58052868898173]
We identify and characterize a phenomenon never discussed before, where models leak irrelevant information from the prompt into the generation in unexpected ways.
We propose an evaluation setting to detect semantic leakage both by humans and automatically, curate a diverse test suite for diagnosing this behavior, and measure significant semantic leakage in 13 flagship models.
arXiv Detail & Related papers (2024-08-12T22:30:55Z) - Towards Better Inclusivity: A Diverse Tweet Corpus of English Varieties [0.0]
We aim to address the issue of bias at its root - the data itself.
We curate a dataset of tweets from countries with high proportions of underserved English variety speakers.
Following best annotation practices, our growing corpus features 170,800 tweets taken from 7 countries.
arXiv Detail & Related papers (2024-01-21T13:18:20Z) - Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding.
We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
arXiv Detail & Related papers (2023-01-31T20:09:33Z) - The World of an Octopus: How Reporting Bias Influences a Language
Model's Perception of Color [73.70233477125781]
We show that reporting bias negatively impacts and inherently limits text-only training.
We then demonstrate that multimodal models can leverage their visual training to mitigate these effects.
arXiv Detail & Related papers (2021-10-15T16:28:17Z) - Mitigating Racial Biases in Toxic Language Detection with an
Equity-Based Ensemble Framework [9.84413545378636]
Recent research has demonstrated how racial biases against users who write African American English exist in popular toxic language datasets.
We propose additional descriptive fairness metrics to better understand the source of these biases.
We show that our proposed framework substantially reduces the racial biases that the model learns from these datasets.
arXiv Detail & Related papers (2021-09-27T15:54:05Z) - Whose Opinions Matter? Perspective-aware Models to Identify Opinions of
Hate Speech Victims in Abusive Language Detection [6.167830237917662]
We present an in-depth study to model polarized opinions coming from different communities.
We believe that by relying on this information, we can divide the annotators into groups sharing similar perspectives.
We propose a novel resource, a multi-perspective English language dataset annotated according to different sub-categories relevant for characterising online abuse.
arXiv Detail & Related papers (2021-06-30T08:35:49Z) - Sentiment analysis in tweets: an assessment study from classical to
modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information.
Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks.
This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z) - Hate Speech Detection and Racial Bias Mitigation in Social Media based
on BERT model [1.9336815376402716]
We introduce a transfer learning approach for hate speech detection based on an existing pre-trained language model called BERT.
We evaluate the proposed model on two publicly available datasets annotated for racism, sexism, hate or offensive content on Twitter.
arXiv Detail & Related papers (2020-08-14T16:47:25Z) - Trawling for Trolling: A Dataset [56.1778095945542]
We present a dataset that models trolling as a subcategory of offensive content.
The dataset has 12,490 samples, split across 5 classes; Normal, Profanity, Trolling, Derogatory and Hate Speech.
arXiv Detail & Related papers (2020-08-02T17:23:55Z) - "To Target or Not to Target": Identification and Analysis of Abusive
Text Using Ensemble of Classifiers [18.053219155702465]
We present an ensemble learning method to identify and analyze abusive and hateful content on social media platforms.
Our stacked ensemble comprises of three machine learning models that capture different aspects of language and provide diverse and coherent insights about inappropriate language.
arXiv Detail & Related papers (2020-06-05T06:59:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.