Related papers: StopHC: A Harmful Content Detection and Mitigation Architecture for Social Media Platforms

StopHC: A Harmful Content Detection and Mitigation Architecture for Social Media Platforms

URL: http://arxiv.org/abs/2411.06138v1
Date: Sat, 09 Nov 2024 10:23:22 GMT
Title: StopHC: A Harmful Content Detection and Mitigation Architecture for Social Media Platforms
Authors: Ciprian-Octavian Truică, Ana-Teodora Constantinescu, Elena-Simona Apostol,
Abstract summary: textscStopHC is a harmful content detection and mitigation architecture for social media platforms. Our solution contains two modules, one that employs deep neural network architecture for harmful content detection, and one that uses a network immunization algorithm to block toxic nodes and stop the spread of harmful content.
Score: 0.46289929100614996
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The mental health of social media users has started more and more to be put at risk by harmful, hateful, and offensive content. In this paper, we propose \textsc{StopHC}, a harmful content detection and mitigation architecture for social media platforms. Our aim with \textsc{StopHC} is to create more secure online environments. Our solution contains two modules, one that employs deep neural network architecture for harmful content detection, and one that uses a network immunization algorithm to block toxic nodes and stop the spread of harmful content. The efficacy of our solution is demonstrated by experiments conducted on two real-world datasets.

Related papers

Illusions of Relevance: Using Content Injection Attacks to Deceive Retrievers, Rerankers, and LLM Judges [52.96987928118327]
We find that embedding models for retrieval, rerankers, and large language model (LLM) relevance judges are vulnerable to content injection attacks. We identify two primary threats: (1) inserting unrelated or harmful content within passages that still appear deceptively "relevant", and (2) inserting entire queries or key query terms into passages to boost their perceived relevance. Our study systematically examines the factors that influence an attack's success, such as the placement of injected content and the balance between relevant and non-relevant material.
arXiv Detail & Related papers (2025-01-30T18:02:15Z)
Sentiment Analysis of Cyberbullying Data in Social Media [0.0]
Our work focuses on leveraging deep learning and natural language understanding techniques to detect traces of bullying in social media posts. One approach utilizes BERT embeddings, while the other replaces the embeddings layer with the recently released embeddings API from OpenAI. We conducted a performance comparison between these two approaches to evaluate their effectiveness in sentiment analysis of Formspring Cyberbullying data.
arXiv Detail & Related papers (2024-11-08T20:41:04Z)
Health Misinformation Detection in Web Content via Web2Vec: A Structural-, Content-based, and Context-aware Approach based on Web2Vec [3.299010876315217]
We focus on Web page content, where there is still room for research to study structural-, content- and context-based features to assess the credibility of Web pages. This work aims to study the effectiveness of such features in association with a deep learning model, starting from an embedded representation of Web pages that has been recently proposed in the context of phishing Web page detection, i.e., Web2Vec.
arXiv Detail & Related papers (2024-07-05T10:33:15Z)
Into the LAIONs Den: Investigating Hate in Multimodal Datasets [67.21783778038645]
This paper investigates the effect of scaling datasets on hateful content through a comparative audit of two datasets: LAION-400M and LAION-2B. We found that hate content increased by nearly 12% with dataset scale, measured both qualitatively and quantitatively. We also found that filtering dataset contents based on Not Safe For Work (NSFW) values calculated based on images alone does not exclude all the harmful content in alt-text.
arXiv Detail & Related papers (2023-11-06T19:00:05Z)
HOD: A Benchmark Dataset for Harmful Object Detection [3.755082744150185]
We present a new benchmark dataset for harmful object detection. Our proposed dataset contains more than 10,000 images across 6 categories that might be harmful. We have conducted extensive experiments to evaluate the effectiveness of our proposed dataset.
arXiv Detail & Related papers (2023-10-08T15:00:38Z)
An Image is Worth a Thousand Toxic Words: A Metamorphic Testing Framework for Content Moderation Software [64.367830425115]
Social media platforms are being increasingly misused to spread toxic content, including hate speech, malicious advertising, and pornography. Despite tremendous efforts in developing and deploying content moderation methods, malicious users can evade moderation by embedding texts into images. We propose a metamorphic testing framework for content moderation software.
arXiv Detail & Related papers (2023-08-18T20:33:06Z)
MCWDST: a Minimum-Cost Weighted Directed Spanning Tree Algorithm for Real-Time Fake News Mitigation in Social Media [10.088200477738749]
We present an end-to-end solution that accurately detects fake news and immunizes network nodes that spread them in real-time. To mitigate the spread of fake news, we propose a real-time network-aware strategy.
arXiv Detail & Related papers (2023-02-23T17:31:40Z)
Countering Malicious Content Moderation Evasion in Online Social Networks: Simulation and Detection of Word Camouflage [64.78260098263489]
Twisting and camouflaging keywords are among the most used techniques to evade platform content moderation systems. This article contributes significantly to countering malicious information by developing multilingual tools to simulate and detect new methods of evasion of content.
arXiv Detail & Related papers (2022-12-27T16:08:49Z)
Mitigating Covertly Unsafe Text within Natural Language Systems [55.26364166702625]
Uncontrolled systems may generate recommendations that lead to injury or life-threatening consequences. In this paper, we distinguish types of text that can lead to physical harm and establish one particularly underexplored category: covertly unsafe text.
arXiv Detail & Related papers (2022-10-17T17:59:49Z)
Cross-Network Social User Embedding with Hybrid Differential Privacy Guarantees [81.6471440778355]
We propose a Cross-network Social User Embedding framework, namely DP-CroSUE, to learn the comprehensive representations of users in a privacy-preserving way. In particular, for each heterogeneous social network, we first introduce a hybrid differential privacy notion to capture the variation of privacy expectations for heterogeneous data types. To further enhance user embeddings, a novel cross-network GCN embedding model is designed to transfer knowledge across networks through those aligned users.
arXiv Detail & Related papers (2022-09-04T06:22:37Z)
Detecting Harmful Content On Online Platforms: What Platforms Need Vs. Where Research Efforts Go [44.774035806004214]
harmful content on online platforms comes in many different forms including hate speech, offensive language, bullying and harassment, misinformation, spam, violence, graphic content, sexual abuse, self harm, and many other. Online platforms seek to moderate such content to limit societal harm, to comply with legislation, and to create a more inclusive environment for their users. There is currently a dichotomy between what types of harmful content online platforms seek to curb, and what research efforts there are to automatically detect such content.
arXiv Detail & Related papers (2021-02-27T08:01:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.