Cross-Platform Violence Detection on Social Media: A Dataset and Analysis
- URL: http://arxiv.org/abs/2506.03312v1
- Date: Tue, 03 Jun 2025 18:54:07 GMT
- Title: Cross-Platform Violence Detection on Social Media: A Dataset and Analysis
- Authors: Celia Chen, Scotty Beland, Ingo Burghardt, Jill Byczek, William J. Conway, Eric Cotugno, Sadaf Davre, Megan Fletcher, Rajesh Kumar Gnanasekaran, Kristin Hamilton, Marilyn Harbert, Jordan Heustis, Tanaya Jha, Emily Klein, Hayden Kramer, Alex Leitch, Jessica Perkins, Casi Sherman, Celia Sterrn, Logan Stevens, Rebecca Zarrella, Jennifer Golbeck,
- Abstract summary: We introduce a cross-platform dataset of 30,000 posts hand-coded for violent threats and sub-types of violence.<n>To evaluate the signal present in this dataset, we perform a machine learning analysis with an existing dataset of violent comments from YouTube.
- Score: 1.3007535106461825
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Violent threats remain a significant problem across social media platforms. Useful, high-quality data facilitates research into the understanding and detection of malicious content, including violence. In this paper, we introduce a cross-platform dataset of 30,000 posts hand-coded for violent threats and sub-types of violence, including political and sexual violence. To evaluate the signal present in this dataset, we perform a machine learning analysis with an existing dataset of violent comments from YouTube. We find that, despite originating from different platforms and using different coding criteria, we achieve high classification accuracy both by training on one dataset and testing on the other, and in a merged dataset condition. These results have implications for content-classification strategies and for understanding violent content across social media.
Related papers
- Multi-Platform Aggregated Dataset of Online Communities (MADOC) [64.45797970830233]
MADOC aggregates and standardizes data from Bluesky, Koo, Reddit, and Voat (2012-2024), containing 18.9 million posts, 236 million comments, and 23.1 million unique users.<n>The dataset enables comparative studies of toxic behavior evolution across platforms through standardized interaction records and sentiment analysis.
arXiv Detail & Related papers (2025-01-22T14:02:11Z) - Modeling offensive content detection for TikTok [0.0]
This research undertakes the collection and analysis of TikTok data containing offensive content.
It builds a series of machine learning and deep learning models for offensive content detection.
arXiv Detail & Related papers (2024-08-29T18:47:41Z) - Modeling Political Orientation of Social Media Posts: An Extended
Analysis [0.0]
Developing machine learning models to characterize political polarization on online social media presents significant challenges.
These challenges mainly stem from various factors such as the lack of annotated data, presence of noise in social media datasets, and the sheer volume of data.
We introduce two methods that leverage on news media bias and post content to label social media posts.
We demonstrate that current machine learning models can exhibit improved performance in predicting political orientation of social media posts.
arXiv Detail & Related papers (2023-11-21T03:34:20Z) - Into the LAIONs Den: Investigating Hate in Multimodal Datasets [67.21783778038645]
This paper investigates the effect of scaling datasets on hateful content through a comparative audit of two datasets: LAION-400M and LAION-2B.
We found that hate content increased by nearly 12% with dataset scale, measured both qualitatively and quantitatively.
We also found that filtering dataset contents based on Not Safe For Work (NSFW) values calculated based on images alone does not exclude all the harmful content in alt-text.
arXiv Detail & Related papers (2023-11-06T19:00:05Z) - Harnessing the Power of Text-image Contrastive Models for Automatic
Detection of Online Misinformation [50.46219766161111]
We develop a self-learning model to explore the constrastive learning in the domain of misinformation identification.
Our model shows the superior performance of non-matched image-text pair detection when the training data is insufficient.
arXiv Detail & Related papers (2023-04-19T02:53:59Z) - Verifying the Robustness of Automatic Credibility Assessment [50.55687778699995]
We show that meaning-preserving changes in input text can mislead the models.
We also introduce BODEGA: a benchmark for testing both victim models and attack methods on misinformation detection tasks.
Our experimental results show that modern large language models are often more vulnerable to attacks than previous, smaller solutions.
arXiv Detail & Related papers (2023-03-14T16:11:47Z) - Collective Obfuscation and Crowdsourcing [2.28438857884398]
We show that widespread use of reporting platforms comes with unique security and privacy implications.
We identify coordinated obfuscation strategies that are intended to hinder the platform's legitimacy.
arXiv Detail & Related papers (2022-08-12T17:57:33Z) - Panning for gold: Lessons learned from the platform-agnostic automated
detection of political content in textual data [48.7576911714538]
We discuss how these techniques can be used to detect political content across different platforms.
We compare the performance of three groups of detection techniques relying on dictionaries, supervised machine learning, or neural networks.
Our results show the limited impact of preprocessing on model performance, with the best results for less noisy data being achieved by neural network- and machine-learning-based models.
arXiv Detail & Related papers (2022-07-01T15:23:23Z) - Data Expansion using Back Translation and Paraphrasing for Hate Speech
Detection [1.192436948211501]
We present a new deep learning-based method that fuses a Back Translation method, and a Paraphrasing technique for data augmentation.
We evaluate our proposal on five publicly available datasets; namely, AskFm corpus, Formspring dataset, Warner and Waseem dataset, Olid, and Wikipedia toxic comments dataset.
arXiv Detail & Related papers (2021-05-25T09:52:42Z) - Content-based Analysis of the Cultural Differences between TikTok and
Douyin [95.32409577885645]
Short-form video social media shifts away from the traditional media paradigm by telling the audience a dynamic story to attract their attention.
In particular, different combinations of everyday objects can be employed to represent a unique scene that is both interesting and understandable.
Offered by the same company, TikTok and Douyin are popular examples of such new media that has become popular in recent years.
The hypothesis that they express cultural differences together with media fashion and social idiosyncrasy is the primary target of our research.
arXiv Detail & Related papers (2020-11-03T01:47:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.