StopHC: A Harmful Content Detection and Mitigation Architecture for Social Media Platforms
- URL: http://arxiv.org/abs/2411.06138v1
- Date: Sat, 09 Nov 2024 10:23:22 GMT
- Title: StopHC: A Harmful Content Detection and Mitigation Architecture for Social Media Platforms
- Authors: Ciprian-Octavian Truică, Ana-Teodora Constantinescu, Elena-Simona Apostol,
- Abstract summary: textscStopHC is a harmful content detection and mitigation architecture for social media platforms.
Our solution contains two modules, one that employs deep neural network architecture for harmful content detection, and one that uses a network immunization algorithm to block toxic nodes and stop the spread of harmful content.
- Score: 0.46289929100614996
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The mental health of social media users has started more and more to be put at risk by harmful, hateful, and offensive content. In this paper, we propose \textsc{StopHC}, a harmful content detection and mitigation architecture for social media platforms. Our aim with \textsc{StopHC} is to create more secure online environments. Our solution contains two modules, one that employs deep neural network architecture for harmful content detection, and one that uses a network immunization algorithm to block toxic nodes and stop the spread of harmful content. The efficacy of our solution is demonstrated by experiments conducted on two real-world datasets.
Related papers
- ToxicTAGS: Decoding Toxic Memes with Rich Tag Annotations [3.708799808977489]
We introduce a first-of-its-kind dataset of 6,300 real-world meme-based posts annotated in two stages: (i) binary classification into toxic and normal, and (ii) fine-grained labelling of toxic memes as hateful, dangerous, or offensive.<n>A key feature of this dataset is that it is enriched with auxiliary metadata of socially relevant tags, enhancing the context of each meme.
arXiv Detail & Related papers (2025-08-06T07:46:14Z) - GenBreak: Red Teaming Text-to-Image Generators Using Large Language Models [65.91565607573786]
Text-to-image (T2I) models can be misused to generate harmful content, including nudity or violence.<n>Recent research on red-teaming and adversarial attacks against T2I models has notable limitations.<n>We propose GenBreak, a framework that fine-tunes a red-team large language model (LLM) to systematically explore underlying vulnerabilities.
arXiv Detail & Related papers (2025-06-11T09:09:12Z) - Illusions of Relevance: Using Content Injection Attacks to Deceive Retrievers, Rerankers, and LLM Judges [52.96987928118327]
We find that embedding models for retrieval, rerankers, and large language model (LLM) relevance judges are vulnerable to content injection attacks.
We identify two primary threats: (1) inserting unrelated or harmful content within passages that still appear deceptively "relevant", and (2) inserting entire queries or key query terms into passages to boost their perceived relevance.
Our study systematically examines the factors that influence an attack's success, such as the placement of injected content and the balance between relevant and non-relevant material.
arXiv Detail & Related papers (2025-01-30T18:02:15Z) - Sentiment Analysis of Cyberbullying Data in Social Media [0.0]
Our work focuses on leveraging deep learning and natural language understanding techniques to detect traces of bullying in social media posts.
One approach utilizes BERT embeddings, while the other replaces the embeddings layer with the recently released embeddings API from OpenAI.
We conducted a performance comparison between these two approaches to evaluate their effectiveness in sentiment analysis of Formspring Cyberbullying data.
arXiv Detail & Related papers (2024-11-08T20:41:04Z) - SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation [65.30207993362595]
Unlearning/editing-based methods for safe generation remove harmful concepts from models but face several challenges.<n>We propose SAFREE, a training-free approach for safe T2I and T2V.<n>We detect a subspace corresponding to a set of toxic concepts in the text embedding space and steer prompt embeddings away from this subspace.
arXiv Detail & Related papers (2024-10-16T17:32:23Z) - Health Misinformation Detection in Web Content via Web2Vec: A Structural-, Content-based, and Context-aware Approach based on Web2Vec [3.299010876315217]
We focus on Web page content, where there is still room for research to study structural-, content- and context-based features to assess the credibility of Web pages.
This work aims to study the effectiveness of such features in association with a deep learning model, starting from an embedded representation of Web pages that has been recently proposed in the context of phishing Web page detection, i.e., Web2Vec.
arXiv Detail & Related papers (2024-07-05T10:33:15Z) - Harmful Suicide Content Detection [35.0823928978658]
We introduce a harmful suicide content detection task for classifying online suicide content into five harmfulness levels.
Our contributions include proposing a novel detection task, a multi-modal Korean benchmark with expert annotations, and suggesting strategies using LLMs to detect illegal and harmful content.
arXiv Detail & Related papers (2024-06-03T03:43:44Z) - Into the LAIONs Den: Investigating Hate in Multimodal Datasets [67.21783778038645]
This paper investigates the effect of scaling datasets on hateful content through a comparative audit of two datasets: LAION-400M and LAION-2B.
We found that hate content increased by nearly 12% with dataset scale, measured both qualitatively and quantitatively.
We also found that filtering dataset contents based on Not Safe For Work (NSFW) values calculated based on images alone does not exclude all the harmful content in alt-text.
arXiv Detail & Related papers (2023-11-06T19:00:05Z) - HOD: A Benchmark Dataset for Harmful Object Detection [3.755082744150185]
We present a new benchmark dataset for harmful object detection.
Our proposed dataset contains more than 10,000 images across 6 categories that might be harmful.
We have conducted extensive experiments to evaluate the effectiveness of our proposed dataset.
arXiv Detail & Related papers (2023-10-08T15:00:38Z) - An Image is Worth a Thousand Toxic Words: A Metamorphic Testing
Framework for Content Moderation Software [64.367830425115]
Social media platforms are being increasingly misused to spread toxic content, including hate speech, malicious advertising, and pornography.
Despite tremendous efforts in developing and deploying content moderation methods, malicious users can evade moderation by embedding texts into images.
We propose a metamorphic testing framework for content moderation software.
arXiv Detail & Related papers (2023-08-18T20:33:06Z) - MCWDST: a Minimum-Cost Weighted Directed Spanning Tree Algorithm for
Real-Time Fake News Mitigation in Social Media [10.088200477738749]
We present an end-to-end solution that accurately detects fake news and immunizes network nodes that spread them in real-time.
To mitigate the spread of fake news, we propose a real-time network-aware strategy.
arXiv Detail & Related papers (2023-02-23T17:31:40Z) - Countering Malicious Content Moderation Evasion in Online Social
Networks: Simulation and Detection of Word Camouflage [64.78260098263489]
Twisting and camouflaging keywords are among the most used techniques to evade platform content moderation systems.
This article contributes significantly to countering malicious information by developing multilingual tools to simulate and detect new methods of evasion of content.
arXiv Detail & Related papers (2022-12-27T16:08:49Z) - Mitigating Covertly Unsafe Text within Natural Language Systems [55.26364166702625]
Uncontrolled systems may generate recommendations that lead to injury or life-threatening consequences.
In this paper, we distinguish types of text that can lead to physical harm and establish one particularly underexplored category: covertly unsafe text.
arXiv Detail & Related papers (2022-10-17T17:59:49Z) - Cross-Network Social User Embedding with Hybrid Differential Privacy
Guarantees [81.6471440778355]
We propose a Cross-network Social User Embedding framework, namely DP-CroSUE, to learn the comprehensive representations of users in a privacy-preserving way.
In particular, for each heterogeneous social network, we first introduce a hybrid differential privacy notion to capture the variation of privacy expectations for heterogeneous data types.
To further enhance user embeddings, a novel cross-network GCN embedding model is designed to transfer knowledge across networks through those aligned users.
arXiv Detail & Related papers (2022-09-04T06:22:37Z) - Detecting Harmful Content On Online Platforms: What Platforms Need Vs.
Where Research Efforts Go [44.774035806004214]
harmful content on online platforms comes in many different forms including hate speech, offensive language, bullying and harassment, misinformation, spam, violence, graphic content, sexual abuse, self harm, and many other.
Online platforms seek to moderate such content to limit societal harm, to comply with legislation, and to create a more inclusive environment for their users.
There is currently a dichotomy between what types of harmful content online platforms seek to curb, and what research efforts there are to automatically detect such content.
arXiv Detail & Related papers (2021-02-27T08:01:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.