Related papers: DeMod: A Holistic Tool with Explainable Detection and Personalized Modification for Toxicity Censorship

DeMod: A Holistic Tool with Explainable Detection and Personalized Modification for Toxicity Censorship

URL: http://arxiv.org/abs/2411.01844v1
Date: Mon, 04 Nov 2024 06:38:43 GMT
Title: DeMod: A Holistic Tool with Explainable Detection and Personalized Modification for Toxicity Censorship
Authors: Yaqiong Li, Peng Zhang, Hansu Gu, Tun Lu, Siyuan Qiao, Yubo Shu, Yiyang Shao, Ning Gu,
Abstract summary: This study investigates people's diverse needs in toxicity censorship and builds a ChatGPT-based censorship tool named DeMod accordingly. DeMod is equipped with the features of explainable Detection and personalized Modification, providing fine-grained detection results, detailed explanations, and personalized modification suggestions. The results suggest DeMod's multiple strengths like the richness of functionality, the accuracy of censorship, and ease of use.
Score: 16.55929590079875
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Although there have been automated approaches and tools supporting toxicity censorship for social posts, most of them focus on detection. Toxicity censorship is a complex process, wherein detection is just an initial task and a user can have further needs such as rationale understanding and content modification. For this problem, we conduct a needfinding study to investigate people's diverse needs in toxicity censorship and then build a ChatGPT-based censorship tool named DeMod accordingly. DeMod is equipped with the features of explainable Detection and personalized Modification, providing fine-grained detection results, detailed explanations, and personalized modification suggestions. We also implemented the tool and recruited 35 Weibo users for evaluation. The results suggest DeMod's multiple strengths like the richness of functionality, the accuracy of censorship, and ease of use. Based on the findings, we further propose several insights into the design of content censorship systems.

Related papers

CensorLab: A Testbed for Censorship Experimentation [15.411134921415567]
We design and implement CensorLab, a generic platform for emulating Internet censorship scenarios. CensorLab aims to support all censorship mechanisms previously or currently deployed by real-world censors. It provides an easy-to-use platform for researchers and practitioners enabling them to perform extensive experimentation.
arXiv Detail & Related papers (2024-12-20T21:17:24Z)
Understanding Routing-Induced Censorship Changes Globally [5.79183660559872]
We investigate the extent to which Equal-cost Multi-path (ECMP) routing is the cause for inconsistencies in censorship results. We find ECMP routing significantly changes observed censorship across protocols, censor mechanisms, and in 17 countries. Our work points to methods for improving future studies, reducing inconsistencies and increasing repeatability.
arXiv Detail & Related papers (2024-06-27T16:21:31Z)
Explainability and Hate Speech: Structured Explanations Make Social Media Moderators Faster [72.84926097773578]
We investigate the effect of explanations on the speed of real-world moderators. Our experiments show that while generic explanations do not affect their speed and are often ignored, structured explanations lower moderators' decision making time by 7.4%.
arXiv Detail & Related papers (2024-06-06T14:23:10Z)
Analyzing Toxicity in Deep Conversations: A Reddit Case Study [0.0]
This work employs a tree-based approach to understand how users behave concerning toxicity in public conversation settings. We collect both the posts and the comment sections of the top 100 posts from 8 Reddit communities that allow profanity, totaling over 1 million responses. We find that toxic comments increase the likelihood of subsequent toxic comments being produced in online conversations.
arXiv Detail & Related papers (2024-04-11T16:10:44Z)
User Attitudes to Content Moderation in Web Search [49.1574468325115]
We examine the levels of support for different moderation practices applied to potentially misleading and/or potentially offensive content in web search. We find that the most supported practice is informing users about potentially misleading or offensive content, and the least supported one is the complete removal of search results. More conservative users and users with lower levels of trust in web search results are more likely to be against content moderation in web search.
arXiv Detail & Related papers (2023-10-05T10:57:15Z)
Depression detection in social media posts using affective and social norm features [84.12658971655253]
We propose a deep architecture for depression detection from social media posts. We incorporate profanity and morality features of posts and words in our architecture using a late fusion scheme. The inclusion of the proposed features yields state-of-the-art results in both settings.
arXiv Detail & Related papers (2023-03-24T21:26:27Z)
Verifying the Robustness of Automatic Credibility Assessment [50.55687778699995]
We show that meaning-preserving changes in input text can mislead the models. We also introduce BODEGA: a benchmark for testing both victim models and attack methods on misinformation detection tasks. Our experimental results show that modern large language models are often more vulnerable to attacks than previous, smaller solutions.
arXiv Detail & Related papers (2023-03-14T16:11:47Z)
Multilingual Content Moderation: A Case Study on Reddit [23.949429463013796]
We propose to study the challenges of content moderation by introducing a multilingual dataset of 1.8 million Reddit comments. We perform extensive experimental analysis to highlight the underlying challenges and suggest related research problems. Our dataset and analysis can help better prepare for the challenges and opportunities of auto moderation.
arXiv Detail & Related papers (2023-02-19T16:36:33Z)
Augmenting Rule-based DNS Censorship Detection at Scale with Machine Learning [38.00013408742201]
Censorship of the domain name system (DNS) is a key mechanism used across different countries. In this paper, we explore how machine learning (ML) models can help streamline the detection process. We find that unsupervised models, trained solely on uncensored instances, can identify new instances and variations of censorship missed by existing probes.
arXiv Detail & Related papers (2023-02-03T23:36:30Z)
Countering Malicious Content Moderation Evasion in Online Social Networks: Simulation and Detection of Word Camouflage [64.78260098263489]
Twisting and camouflaging keywords are among the most used techniques to evade platform content moderation systems. This article contributes significantly to countering malicious information by developing multilingual tools to simulate and detect new methods of evasion of content.
arXiv Detail & Related papers (2022-12-27T16:08:49Z)
How We Express Ourselves Freely: Censorship, Self-censorship, and Anti-censorship on a Chinese Social Media [4.408128846525362]
We identify the metrics of censorship and self-censorship, find the influence factors, and construct a mediation model to measure their relationship. Based on these findings, we discuss implications for democratic social media design and future censorship research.
arXiv Detail & Related papers (2022-11-24T18:28:16Z)
On the Social and Technical Challenges of Web Search Autosuggestion Moderation [118.47867428272878]
Autosuggestions are typically generated by machine learning (ML) systems trained on a corpus of search logs and document representations. While current search engines have become increasingly proficient at suppressing such problematic suggestions, there are still persistent issues that remain. We discuss several dimensions of problematic suggestions, difficult issues along the pipeline, and why our discussion applies to the increasing number of applications beyond web search.
arXiv Detail & Related papers (2020-07-09T19:22:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.