Annotators with Attitudes: How Annotator Beliefs And Identities Bias
Toxic Language Detection
- URL: http://arxiv.org/abs/2111.07997v1
- Date: Mon, 15 Nov 2021 18:58:20 GMT
- Title: Annotators with Attitudes: How Annotator Beliefs And Identities Bias
Toxic Language Detection
- Authors: Maarten Sap, Swabha Swayamdipta, Laura Vianna, Xuhui Zhou, Yejin Choi,
Noah A. Smith
- Abstract summary: We investigate the effect of annotator identities (who) and beliefs (why) on toxic language annotations.
We consider posts with three characteristics: anti-Black language, African American English dialect, and vulgarity.
Our results show strong associations between annotator identity and beliefs and their ratings of toxicity.
- Score: 75.54119209776894
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The perceived toxicity of language can vary based on someone's identity and
beliefs, but this variation is often ignored when collecting toxic language
datasets, resulting in dataset and model biases. We seek to understand the who,
why, and what behind biases in toxicity annotations. In two online studies with
demographically and politically diverse participants, we investigate the effect
of annotator identities (who) and beliefs (why), drawing from social psychology
research about hate speech, free speech, racist beliefs, political leaning, and
more. We disentangle what is annotated as toxic by considering posts with three
characteristics: anti-Black language, African American English (AAE) dialect,
and vulgarity. Our results show strong associations between annotator identity
and beliefs and their ratings of toxicity. Notably, more conservative
annotators and those who scored highly on our scale for racist beliefs were
less likely to rate anti-Black language as toxic, but more likely to rate AAE
as toxic. We additionally present a case study illustrating how a popular
toxicity detection system's ratings inherently reflect only specific beliefs
and perspectives. Our findings call for contextualizing toxicity labels in
social variables, which raises immense implications for toxic language
annotation and detection.
Related papers
- A Comprehensive View of the Biases of Toxicity and Sentiment Analysis
Methods Towards Utterances with African American English Expressions [5.472714002128254]
We study bias on two Web-based (YouTube and Twitter) datasets and two spoken English datasets.
We isolate the impact of AAE expression usage via linguistic control features from the Linguistic Inquiry and Word Count software.
We present consistent results on how a heavy usage of AAE expressions may cause the speaker to be considered substantially more toxic, even when speaking about nearly the same subject.
arXiv Detail & Related papers (2024-01-23T12:41:03Z) - Twits, Toxic Tweets, and Tribal Tendencies: Trends in Politically Polarized Posts on Twitter [5.161088104035108]
We explore the role that partisanship and affective polarization play in contributing to toxicity on an individual level and a topic level on Twitter/X.
After collecting 89.6 million tweets from 43,151 Twitter/X users, we determine how several account-level characteristics, including partisanship, predict how often users post toxic content.
arXiv Detail & Related papers (2023-07-19T17:24:47Z) - Comparing Biases and the Impact of Multilingual Training across Multiple
Languages [70.84047257764405]
We present a bias analysis across Italian, Chinese, English, Hebrew, and Spanish on the downstream sentiment analysis task.
We adapt existing sentiment bias templates in English to Italian, Chinese, Hebrew, and Spanish for four attributes: race, religion, nationality, and gender.
Our results reveal similarities in bias expression such as favoritism of groups that are dominant in each language's culture.
arXiv Detail & Related papers (2023-05-18T18:15:07Z) - Classification of social media Toxic comments using Machine learning
models [0.0]
The abstract outlines the problem of toxic comments on social media platforms, where individuals use disrespectful, abusive, and unreasonable language.
This behavior is referred to as anti-social behavior, which occurs during online debates, comments, and fights.
The comments containing explicit language can be classified into various categories, such as toxic, severe toxic, obscene, threat, insult, and identity hate.
To protect users from offensive language, companies have started flagging comments and blocking users.
arXiv Detail & Related papers (2023-04-14T05:40:11Z) - Beyond Plain Toxic: Detection of Inappropriate Statements on Flammable
Topics for the Russian Language [76.58220021791955]
We present two text collections labelled according to binary notion of inapropriateness and a multinomial notion of sensitive topic.
To objectivise the notion of inappropriateness, we define it in a data-driven way though crowdsourcing.
arXiv Detail & Related papers (2022-03-04T15:59:06Z) - Mitigating Biases in Toxic Language Detection through Invariant
Rationalization [70.36701068616367]
biases toward some attributes, including gender, race, and dialect, exist in most training datasets for toxicity detection.
We propose to use invariant rationalization (InvRat), a game-theoretic framework consisting of a rationale generator and a predictor, to rule out the spurious correlation of certain syntactic patterns.
Our method yields lower false positive rate in both lexical and dialectal attributes than previous debiasing methods.
arXiv Detail & Related papers (2021-06-14T08:49:52Z) - Designing Toxic Content Classification for a Diversity of Perspectives [15.466547856660803]
We survey 17,280 participants to understand how user expectations for what constitutes toxic content differ across demographics, beliefs, and personal experiences.
We find that groups historically at-risk of harassment are more likely to flag a random comment drawn from Reddit, Twitter, or 4chan as toxic.
We show how current one-size-fits-all toxicity classification algorithms, like the Perspective API from Jigsaw, can improve in accuracy by 86% on average through personalized model tuning.
arXiv Detail & Related papers (2021-06-04T16:45:15Z) - Challenges in Automated Debiasing for Toxic Language Detection [81.04406231100323]
Biased associations have been a challenge in the development of classifiers for detecting toxic language.
We investigate recently introduced debiasing methods for text classification datasets and models, as applied to toxic language detection.
Our focus is on lexical (e.g., swear words, slurs, identity mentions) and dialectal markers (specifically African American English)
arXiv Detail & Related papers (2021-01-29T22:03:17Z) - Reading Between the Demographic Lines: Resolving Sources of Bias in
Toxicity Classifiers [0.0]
Perspective API is perhaps the most widely used toxicity classifier in industry.
Google's model tends to unfairly assign higher toxicity scores to comments containing words referring to the identities of commonly targeted groups.
We have constructed several toxicity classifiers with the intention of reducing unintended bias while maintaining strong classification performance.
arXiv Detail & Related papers (2020-06-29T21:40:55Z) - Racism is a Virus: Anti-Asian Hate and Counterspeech in Social Media
during the COVID-19 Crisis [51.39895377836919]
COVID-19 has sparked racism and hate on social media targeted towards Asian communities.
We study the evolution and spread of anti-Asian hate speech through the lens of Twitter.
We create COVID-HATE, the largest dataset of anti-Asian hate and counterspeech spanning 14 months.
arXiv Detail & Related papers (2020-05-25T21:58:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.