BiasX: "Thinking Slow" in Toxic Content Moderation with Explanations of
Implied Social Biases
- URL: http://arxiv.org/abs/2305.13589v1
- Date: Tue, 23 May 2023 01:45:18 GMT
- Title: BiasX: "Thinking Slow" in Toxic Content Moderation with Explanations of
Implied Social Biases
- Authors: Yiming Zhang, Sravani Nanduri, Liwei Jiang, Tongshuang Wu, Maarten Sap
- Abstract summary: BiasX is a framework that enhances content moderation setups with free-text explanations of statements' implied social biases.
We show that participants substantially benefit from explanations for correctly identifying subtly (non-)toxic content.
Our results showcase the promise of using free-text explanations to encourage more thoughtful toxicity moderation.
- Score: 28.519851740902258
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Toxicity annotators and content moderators often default to mental shortcuts
when making decisions. This can lead to subtle toxicity being missed, and
seemingly toxic but harmless content being over-detected. We introduce BiasX, a
framework that enhances content moderation setups with free-text explanations
of statements' implied social biases, and explore its effectiveness through a
large-scale crowdsourced user study. We show that indeed, participants
substantially benefit from explanations for correctly identifying subtly
(non-)toxic content. The quality of explanations is critical: imperfect
machine-generated explanations (+2.4% on hard toxic examples) help less
compared to expert-written human explanations (+7.2%). Our results showcase the
promise of using free-text explanations to encourage more thoughtful toxicity
moderation.
Related papers
- Tracking Patterns in Toxicity and Antisocial Behavior Over User Lifetimes on Large Social Media Platforms [0.2630859234884723]
We analyze toxicity over a 14-year time span on nearly 500 million comments from Reddit and Wikipedia.
We find that the most toxic behavior on Reddit exhibited in aggregate by the most active users, and the most toxic behavior on Wikipedia exhibited in aggregate by the least active users.
arXiv Detail & Related papers (2024-07-12T15:45:02Z) - Explainability and Hate Speech: Structured Explanations Make Social Media Moderators Faster [72.84926097773578]
We investigate the effect of explanations on the speed of real-world moderators.
Our experiments show that while generic explanations do not affect their speed and are often ignored, structured explanations lower moderators' decision making time by 7.4%.
arXiv Detail & Related papers (2024-06-06T14:23:10Z) - Analyzing Toxicity in Deep Conversations: A Reddit Case Study [0.0]
This work employs a tree-based approach to understand how users behave concerning toxicity in public conversation settings.
We collect both the posts and the comment sections of the top 100 posts from 8 Reddit communities that allow profanity, totaling over 1 million responses.
We find that toxic comments increase the likelihood of subsequent toxic comments being produced in online conversations.
arXiv Detail & Related papers (2024-04-11T16:10:44Z) - Comprehensive Assessment of Toxicity in ChatGPT [49.71090497696024]
We evaluate the toxicity in ChatGPT by utilizing instruction-tuning datasets.
prompts in creative writing tasks can be 2x more likely to elicit toxic responses.
Certain deliberately toxic prompts, designed in earlier studies, no longer yield harmful responses.
arXiv Detail & Related papers (2023-11-03T14:37:53Z) - Annotators with Attitudes: How Annotator Beliefs And Identities Bias
Toxic Language Detection [75.54119209776894]
We investigate the effect of annotator identities (who) and beliefs (why) on toxic language annotations.
We consider posts with three characteristics: anti-Black language, African American English dialect, and vulgarity.
Our results show strong associations between annotator identity and beliefs and their ratings of toxicity.
arXiv Detail & Related papers (2021-11-15T18:58:20Z) - Mitigating Biases in Toxic Language Detection through Invariant
Rationalization [70.36701068616367]
biases toward some attributes, including gender, race, and dialect, exist in most training datasets for toxicity detection.
We propose to use invariant rationalization (InvRat), a game-theoretic framework consisting of a rationale generator and a predictor, to rule out the spurious correlation of certain syntactic patterns.
Our method yields lower false positive rate in both lexical and dialectal attributes than previous debiasing methods.
arXiv Detail & Related papers (2021-06-14T08:49:52Z) - News consumption and social media regulations policy [70.31753171707005]
We analyze two social media that enforced opposite moderation methods, Twitter and Gab, to assess the interplay between news consumption and content regulation.
Our results show that the presence of moderation pursued by Twitter produces a significant reduction of questionable content.
The lack of clear regulation on Gab results in the tendency of the user to engage with both types of content, showing a slight preference for the questionable ones which may account for a dissing/endorsement behavior.
arXiv Detail & Related papers (2021-06-07T19:26:32Z) - Designing Toxic Content Classification for a Diversity of Perspectives [15.466547856660803]
We survey 17,280 participants to understand how user expectations for what constitutes toxic content differ across demographics, beliefs, and personal experiences.
We find that groups historically at-risk of harassment are more likely to flag a random comment drawn from Reddit, Twitter, or 4chan as toxic.
We show how current one-size-fits-all toxicity classification algorithms, like the Perspective API from Jigsaw, can improve in accuracy by 86% on average through personalized model tuning.
arXiv Detail & Related papers (2021-06-04T16:45:15Z) - Toxicity Detection: Does Context Really Matter? [22.083682201142242]
We find that context can amplify or mitigate the perceived toxicity of posts.
Surprisingly, we also find no evidence that context actually improves the performance of toxicity classifiers.
This points to the need for larger datasets of comments annotated in context.
arXiv Detail & Related papers (2020-06-01T15:03:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.