An Image is Worth a Thousand Toxic Words: A Metamorphic Testing
Framework for Content Moderation Software
- URL: http://arxiv.org/abs/2308.09810v1
- Date: Fri, 18 Aug 2023 20:33:06 GMT
- Title: An Image is Worth a Thousand Toxic Words: A Metamorphic Testing
Framework for Content Moderation Software
- Authors: Wenxuan Wang, Jingyuan Huang, Jen-tse Huang, Chang Chen, Jiazhen Gu,
Pinjia He, Michael R. Lyu
- Abstract summary: Social media platforms are being increasingly misused to spread toxic content, including hate speech, malicious advertising, and pornography.
Despite tremendous efforts in developing and deploying content moderation methods, malicious users can evade moderation by embedding texts into images.
We propose a metamorphic testing framework for content moderation software.
- Score: 64.367830425115
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The exponential growth of social media platforms has brought about a
revolution in communication and content dissemination in human society.
Nevertheless, these platforms are being increasingly misused to spread toxic
content, including hate speech, malicious advertising, and pornography, leading
to severe negative consequences such as harm to teenagers' mental health.
Despite tremendous efforts in developing and deploying textual and image
content moderation methods, malicious users can evade moderation by embedding
texts into images, such as screenshots of the text, usually with some
interference. We find that modern content moderation software's performance
against such malicious inputs remains underexplored. In this work, we propose
OASIS, a metamorphic testing framework for content moderation software. OASIS
employs 21 transform rules summarized from our pilot study on 5,000 real-world
toxic contents collected from 4 popular social media applications, including
Twitter, Instagram, Sina Weibo, and Baidu Tieba. Given toxic textual contents,
OASIS can generate image test cases, which preserve the toxicity yet are likely
to bypass moderation. In the evaluation, we employ OASIS to test five
commercial textual content moderation software from famous companies (i.e.,
Google Cloud, Microsoft Azure, Baidu Cloud, Alibaba Cloud and Tencent Cloud),
as well as a state-of-the-art moderation research model. The results show that
OASIS achieves up to 100% error finding rates. Moreover, through retraining the
models with the test cases generated by OASIS, the robustness of the moderation
model can be improved without performance degradation.
Related papers
- Advancing Content Moderation: Evaluating Large Language Models for Detecting Sensitive Content Across Text, Images, and Videos [0.1399948157377307]
Governments, educators, and parents are often at odds with media platforms about how to regulate, control, and limit the spread of such content.
Techniques from natural language processing and computer vision have been used widely to automatically identify and filter out sensitive content.
More sophisticated algorithms for understanding the context of both text and image may open rooms for improvement in content censorship.
arXiv Detail & Related papers (2024-11-26T05:29:18Z) - A Dataset and Benchmark for Copyright Infringement Unlearning from Text-to-Image Diffusion Models [52.49582606341111]
Copyright law confers creators the exclusive rights to reproduce, distribute, and monetize their creative works.
Recent progress in text-to-image generation has introduced formidable challenges to copyright enforcement.
We introduce a novel pipeline that harmonizes CLIP, ChatGPT, and diffusion models to curate a dataset.
arXiv Detail & Related papers (2024-01-04T11:14:01Z) - Content Moderation on Social Media in the EU: Insights From the DSA
Transparency Database [0.0]
Digital Services Act (DSA) requires large social media platforms in the EU to provide clear and specific information whenever they restrict access to certain content.
Statements of Reasons (SoRs) are collected in the DSA Transparency Database to ensure transparency and scrutiny of content moderation decisions.
We empirically analyze 156 million SoRs within an observation period of two months to provide an early look at content moderation decisions of social media platforms in the EU.
arXiv Detail & Related papers (2023-12-07T16:56:19Z) - Understanding writing style in social media with a supervised
contrastively pre-trained transformer [57.48690310135374]
Online Social Networks serve as fertile ground for harmful behavior, ranging from hate speech to the dissemination of disinformation.
We introduce the Style Transformer for Authorship Representations (STAR), trained on a large corpus derived from public sources of 4.5 x 106 authored texts.
Using a support base of 8 documents of 512 tokens, we can discern authors from sets of up to 1616 authors with at least 80% accuracy.
arXiv Detail & Related papers (2023-10-17T09:01:17Z) - DeepfakeArt Challenge: A Benchmark Dataset for Generative AI Art Forgery and Data Poisoning Detection [57.51313366337142]
There has been growing concern over the use of generative AI for malicious purposes.
In the realm of visual content synthesis using generative AI, key areas of significant concern has been image forgery and data poisoning.
We introduce the DeepfakeArt Challenge, a large-scale challenge benchmark dataset designed specifically to aid in the building of machine learning algorithms for generative AI art forgery and data poisoning detection.
arXiv Detail & Related papers (2023-06-02T05:11:27Z) - Validating Multimedia Content Moderation Software via Semantic Fusion [16.322773343799575]
We introduce Semantic Fusion, a general, effective methodology for validating multimedia content moderation software.
We employ DUO to test five commercial content moderation software and two state-of-the-art models against three kinds of toxic content.
The results show that DUO achieves up to 100% error finding rate (EFR) when testing moderation software.
arXiv Detail & Related papers (2023-05-23T02:44:15Z) - Harnessing the Power of Text-image Contrastive Models for Automatic
Detection of Online Misinformation [50.46219766161111]
We develop a self-learning model to explore the constrastive learning in the domain of misinformation identification.
Our model shows the superior performance of non-matched image-text pair detection when the training data is insufficient.
arXiv Detail & Related papers (2023-04-19T02:53:59Z) - MTTM: Metamorphic Testing for Textual Content Moderation Software [11.759353169546646]
Social media platforms have been increasingly exploited to propagate toxic content.
malicious users can evade moderation by changing only a few words in the toxic content.
We propose MTTM, a Metamorphic Testing framework for Textual content Moderation software.
arXiv Detail & Related papers (2023-02-11T14:44:39Z) - Countering Malicious Content Moderation Evasion in Online Social
Networks: Simulation and Detection of Word Camouflage [64.78260098263489]
Twisting and camouflaging keywords are among the most used techniques to evade platform content moderation systems.
This article contributes significantly to countering malicious information by developing multilingual tools to simulate and detect new methods of evasion of content.
arXiv Detail & Related papers (2022-12-27T16:08:49Z) - WLV-RIT at SemEval-2021 Task 5: A Neural Transformer Framework for
Detecting Toxic Spans [2.4737119633827174]
In recent years, the widespread use of social media has led to an increase in the generation of toxic and offensive content on online platforms.
Social media platforms have worked on developing automatic detection methods and employing human moderators to cope with this deluge of offensive content.
arXiv Detail & Related papers (2021-04-09T22:52:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.