Harmful YouTube Video Detection: A Taxonomy of Online Harm and MLLMs as Alternative Annotators
- URL: http://arxiv.org/abs/2411.05854v1
- Date: Wed, 06 Nov 2024 23:48:30 GMT
- Title: Harmful YouTube Video Detection: A Taxonomy of Online Harm and MLLMs as Alternative Annotators
- Authors: Claire Wonjeong Jo, Miki WesoĊowska, Magdalena Wojcieszak,
- Abstract summary: This study advances measures and methods to detect harm in video content.
We develop a comprehensive taxonomy for online harm on video platforms, categorizing it into six categories: Information, Hate and harassment, Addictive, Clickbait, Sexual, and Physical harms.
We analyze 19,422 YouTube videos using 14 image frames, 1 thumbnail, and text metadata, comparing the accuracy of crowdworkers (Mturk) and GPT-4-Turbo.
- Score: 0.0
- License:
- Abstract: Short video platforms, such as YouTube, Instagram, or TikTok, are used by billions of users globally. These platforms expose users to harmful content, ranging from clickbait or physical harms to misinformation or online hate. Yet, detecting harmful videos remains challenging due to an inconsistent understanding of what constitutes harm and limited resources and mental tolls involved in human annotation. As such, this study advances measures and methods to detect harm in video content. First, we develop a comprehensive taxonomy for online harm on video platforms, categorizing it into six categories: Information, Hate and harassment, Addictive, Clickbait, Sexual, and Physical harms. Next, we establish multimodal large language models as reliable annotators of harmful videos. We analyze 19,422 YouTube videos using 14 image frames, 1 thumbnail, and text metadata, comparing the accuracy of crowdworkers (Mturk) and GPT-4-Turbo with domain expert annotations serving as the gold standard. Our results demonstrate that GPT-4-Turbo outperforms crowdworkers in both binary classification (harmful vs. harmless) and multi-label harm categorization tasks. Methodologically, this study extends the application of LLMs to multi-label and multi-modal contexts beyond text annotation and binary classification. Practically, our study contributes to online harm mitigation by guiding the definitions and identification of harmful content on video platforms.
Related papers
- Advancing Content Moderation: Evaluating Large Language Models for Detecting Sensitive Content Across Text, Images, and Videos [0.1399948157377307]
Governments, educators, and parents are often at odds with media platforms about how to regulate, control, and limit the spread of such content.
Techniques from natural language processing and computer vision have been used widely to automatically identify and filter out sensitive content.
More sophisticated algorithms for understanding the context of both text and image may open rooms for improvement in content censorship.
arXiv Detail & Related papers (2024-11-26T05:29:18Z) - MultiHateClip: A Multilingual Benchmark Dataset for Hateful Video Detection on YouTube and Bilibili [11.049937698021054]
This study presents MultiHateClip, a novel multilingual dataset created through hate lexicons and human annotation.
It aims to enhance the detection of hateful videos on platforms such as YouTube and Bilibili, including content in both English and Chinese languages.
arXiv Detail & Related papers (2024-07-28T08:19:09Z) - Understanding writing style in social media with a supervised
contrastively pre-trained transformer [57.48690310135374]
Online Social Networks serve as fertile ground for harmful behavior, ranging from hate speech to the dissemination of disinformation.
We introduce the Style Transformer for Authorship Representations (STAR), trained on a large corpus derived from public sources of 4.5 x 106 authored texts.
Using a support base of 8 documents of 512 tokens, we can discern authors from sets of up to 1616 authors with at least 80% accuracy.
arXiv Detail & Related papers (2023-10-17T09:01:17Z) - Malicious or Benign? Towards Effective Content Moderation for Children's
Videos [1.0323063834827415]
This paper introduces our toolkit Malicious or Benign for promoting research on automated content moderation of children's videos.
We present 1) a customizable annotation tool for videos, 2) a new dataset with difficult to detect test cases of malicious content, and 3) a benchmark suite of state-of-the-art video classification models.
arXiv Detail & Related papers (2023-05-24T20:33:38Z) - A Video Is Worth 4096 Tokens: Verbalize Videos To Understand Them In
Zero Shot [67.00455874279383]
We propose verbalizing long videos to generate descriptions in natural language, then performing video-understanding tasks on the generated story as opposed to the original video.
Our method, despite being zero-shot, achieves significantly better results than supervised baselines for video understanding.
To alleviate a lack of story understanding benchmarks, we publicly release the first dataset on a crucial task in computational social science on persuasion strategy identification.
arXiv Detail & Related papers (2023-05-16T19:13:11Z) - Countering Malicious Content Moderation Evasion in Online Social
Networks: Simulation and Detection of Word Camouflage [64.78260098263489]
Twisting and camouflaging keywords are among the most used techniques to evade platform content moderation systems.
This article contributes significantly to countering malicious information by developing multilingual tools to simulate and detect new methods of evasion of content.
arXiv Detail & Related papers (2022-12-27T16:08:49Z) - Video Manipulations Beyond Faces: A Dataset with Human-Machine Analysis [60.13902294276283]
We present VideoSham, a dataset consisting of 826 videos (413 real and 413 manipulated).
Many of the existing deepfake datasets focus exclusively on two types of facial manipulations -- swapping with a different subject's face or altering the existing face.
Our analysis shows that state-of-the-art manipulation detection algorithms only work for a few specific attacks and do not scale well on VideoSham.
arXiv Detail & Related papers (2022-07-26T17:39:04Z) - TeamX@DravidianLangTech-ACL2022: A Comparative Analysis for Troll-Based
Meme Classification [21.32190107220764]
harmful content online raised concerns among social media platforms, government agencies, policymakers, and society as a whole.
Among different harmful content textittrolling-based online content is one of them, where the idea is to post a message that is provocative, offensive, or menacing with an intent to mislead the audience.
This study provides a comparative analysis of troll-based memes classification using the textual, visual, and multimodal content.
arXiv Detail & Related papers (2022-05-09T16:19:28Z) - Detecting and Understanding Harmful Memes: A Survey [48.135415967633676]
We offer a comprehensive survey with a focus on harmful memes.
One interesting finding is that many types of harmful memes are not really studied, e.g., such featuring self-harm and extremism.
Another observation is that memes can propagate globally through repackaging in different languages and that they can also be multilingual.
arXiv Detail & Related papers (2022-05-09T13:43:27Z) - Detecting Harmful Content On Online Platforms: What Platforms Need Vs.
Where Research Efforts Go [44.774035806004214]
harmful content on online platforms comes in many different forms including hate speech, offensive language, bullying and harassment, misinformation, spam, violence, graphic content, sexual abuse, self harm, and many other.
Online platforms seek to moderate such content to limit societal harm, to comply with legislation, and to create a more inclusive environment for their users.
There is currently a dichotomy between what types of harmful content online platforms seek to curb, and what research efforts there are to automatically detect such content.
arXiv Detail & Related papers (2021-02-27T08:01:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.