Designing Toxic Content Classification for a Diversity of Perspectives
- URL: http://arxiv.org/abs/2106.04511v1
- Date: Fri, 4 Jun 2021 16:45:15 GMT
- Title: Designing Toxic Content Classification for a Diversity of Perspectives
- Authors: Deepak Kumar, Patrick Gage Kelley, Sunny Consolvo, Joshua Mason, Elie
Bursztein, Zakir Durumeric, Kurt Thomas, Michael Bailey
- Abstract summary: We survey 17,280 participants to understand how user expectations for what constitutes toxic content differ across demographics, beliefs, and personal experiences.
We find that groups historically at-risk of harassment are more likely to flag a random comment drawn from Reddit, Twitter, or 4chan as toxic.
We show how current one-size-fits-all toxicity classification algorithms, like the Perspective API from Jigsaw, can improve in accuracy by 86% on average through personalized model tuning.
- Score: 15.466547856660803
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we demonstrate how existing classifiers for identifying toxic
comments online fail to generalize to the diverse concerns of Internet users.
We survey 17,280 participants to understand how user expectations for what
constitutes toxic content differ across demographics, beliefs, and personal
experiences. We find that groups historically at-risk of harassment - such as
people who identify as LGBTQ+ or young adults - are more likely to to flag a
random comment drawn from Reddit, Twitter, or 4chan as toxic, as are people who
have personally experienced harassment in the past. Based on our findings, we
show how current one-size-fits-all toxicity classification algorithms, like the
Perspective API from Jigsaw, can improve in accuracy by 86% on average through
personalized model tuning. Ultimately, we highlight current pitfalls and new
design directions that can improve the equity and efficacy of toxic content
classifiers for all users.
Related papers
- Tracking Patterns in Toxicity and Antisocial Behavior Over User Lifetimes on Large Social Media Platforms [0.2630859234884723]
We analyze toxicity over a 14-year time span on nearly 500 million comments from Reddit and Wikipedia.
We find that the most toxic behavior on Reddit exhibited in aggregate by the most active users, and the most toxic behavior on Wikipedia exhibited in aggregate by the least active users.
arXiv Detail & Related papers (2024-07-12T15:45:02Z) - Comprehensive Assessment of Toxicity in ChatGPT [49.71090497696024]
We evaluate the toxicity in ChatGPT by utilizing instruction-tuning datasets.
prompts in creative writing tasks can be 2x more likely to elicit toxic responses.
Certain deliberately toxic prompts, designed in earlier studies, no longer yield harmful responses.
arXiv Detail & Related papers (2023-11-03T14:37:53Z) - Twits, Toxic Tweets, and Tribal Tendencies: Trends in Politically Polarized Posts on Twitter [5.161088104035108]
We explore the role that partisanship and affective polarization play in contributing to toxicity on an individual level and a topic level on Twitter/X.
After collecting 89.6 million tweets from 43,151 Twitter/X users, we determine how several account-level characteristics, including partisanship, predict how often users post toxic content.
arXiv Detail & Related papers (2023-07-19T17:24:47Z) - Analyzing Norm Violations in Live-Stream Chat [49.120561596550395]
We study the first NLP study dedicated to detecting norm violations in conversations on live-streaming platforms.
We define norm violation categories in live-stream chats and annotate 4,583 moderated comments from Twitch.
Our results show that appropriate contextual information can boost moderation performance by 35%.
arXiv Detail & Related papers (2023-05-18T05:58:27Z) - Classification of social media Toxic comments using Machine learning
models [0.0]
The abstract outlines the problem of toxic comments on social media platforms, where individuals use disrespectful, abusive, and unreasonable language.
This behavior is referred to as anti-social behavior, which occurs during online debates, comments, and fights.
The comments containing explicit language can be classified into various categories, such as toxic, severe toxic, obscene, threat, insult, and identity hate.
To protect users from offensive language, companies have started flagging comments and blocking users.
arXiv Detail & Related papers (2023-04-14T05:40:11Z) - Beyond Plain Toxic: Detection of Inappropriate Statements on Flammable
Topics for the Russian Language [76.58220021791955]
We present two text collections labelled according to binary notion of inapropriateness and a multinomial notion of sensitive topic.
To objectivise the notion of inappropriateness, we define it in a data-driven way though crowdsourcing.
arXiv Detail & Related papers (2022-03-04T15:59:06Z) - Annotators with Attitudes: How Annotator Beliefs And Identities Bias
Toxic Language Detection [75.54119209776894]
We investigate the effect of annotator identities (who) and beliefs (why) on toxic language annotations.
We consider posts with three characteristics: anti-Black language, African American English dialect, and vulgarity.
Our results show strong associations between annotator identity and beliefs and their ratings of toxicity.
arXiv Detail & Related papers (2021-11-15T18:58:20Z) - News consumption and social media regulations policy [70.31753171707005]
We analyze two social media that enforced opposite moderation methods, Twitter and Gab, to assess the interplay between news consumption and content regulation.
Our results show that the presence of moderation pursued by Twitter produces a significant reduction of questionable content.
The lack of clear regulation on Gab results in the tendency of the user to engage with both types of content, showing a slight preference for the questionable ones which may account for a dissing/endorsement behavior.
arXiv Detail & Related papers (2021-06-07T19:26:32Z) - Reading Between the Demographic Lines: Resolving Sources of Bias in
Toxicity Classifiers [0.0]
Perspective API is perhaps the most widely used toxicity classifier in industry.
Google's model tends to unfairly assign higher toxicity scores to comments containing words referring to the identities of commonly targeted groups.
We have constructed several toxicity classifiers with the intention of reducing unintended bias while maintaining strong classification performance.
arXiv Detail & Related papers (2020-06-29T21:40:55Z) - Information Consumption and Social Response in a Segregated Environment:
the Case of Gab [74.5095691235917]
This work provides a characterization of the interaction patterns within Gab around the COVID-19 topic.
We find that there are no strong statistical differences in the social response to questionable and reliable content.
Our results provide insights toward the understanding of coordinated inauthentic behavior and on the early-warning of information operation.
arXiv Detail & Related papers (2020-06-03T11:34:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.