Catching Dark Signals in Algorithms: Unveiling Audiovisual and Thematic Markers of Unsafe Content Recommended for Children and Teenagers
- URL: http://arxiv.org/abs/2507.12571v2
- Date: Fri, 01 Aug 2025 21:40:57 GMT
- Title: Catching Dark Signals in Algorithms: Unveiling Audiovisual and Thematic Markers of Unsafe Content Recommended for Children and Teenagers
- Authors: Haoning Xue, Brian Nishimine, Martin Hilbert, Drew Cingel, Samantha Vigil, Jane Shawcroft, Arti Thakur, Zubair Shafiq, Jingwen Zhang,
- Abstract summary: The prevalence of short form video platforms, combined with the ineffectiveness of age verification mechanisms, raises concerns about the potential harms facing children and teenagers in an algorithm-moderated online environment.<n>We conducted multimodal feature analysis and thematic topic modeling of 4,492 short videos recommended to children and teenagers on Instagram Reels, TikTok, and YouTube Shorts.<n>This feature-level and content-level analysis revealed that unsafe (i.e., problematic, mentally distressing) short videos possess darker visual features and contain explicitly harmful content and implicit harm from anxiety-inducing ordinary content.
- Score: 13.39320891153433
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The prevalence of short form video platforms, combined with the ineffectiveness of age verification mechanisms, raises concerns about the potential harms facing children and teenagers in an algorithm-moderated online environment. We conducted multimodal feature analysis and thematic topic modeling of 4,492 short videos recommended to children and teenagers on Instagram Reels, TikTok, and YouTube Shorts, collected as a part of an algorithm auditing experiment. This feature-level and content-level analysis revealed that unsafe (i.e., problematic, mentally distressing) short videos (a) possess darker visual features and (b) contain explicitly harmful content and implicit harm from anxiety-inducing ordinary content. We introduce a useful framework of online harm (i.e., explicit, implicit, unintended), providing a unique lens for understanding the dynamic, multifaceted online risks facing children and teenagers. The findings highlight the importance of protecting younger audiences in critical developmental stages from both explicit and implicit risks on social media, calling for nuanced content moderation, age verification, and platform regulation.
Related papers
- SNIFR : Boosting Fine-Grained Child Harmful Content Detection Through Audio-Visual Alignment with Cascaded Cross-Transformer [6.590879020134438]
Malicious users exploit moderation systems by embedding unsafe content in minimal frames to evade detection.<n>In this study, we embed audio cues with visual for fine-grained child harmful content detection and introduce SNIFR, a novel framework for effective alignment.
arXiv Detail & Related papers (2025-06-03T20:37:23Z) - Video-SafetyBench: A Benchmark for Safety Evaluation of Video LVLMs [51.90597846977058]
Video-SafetyBench is the first benchmark designed to evaluate the safety of LVLMs under video-text attacks.<n>It comprises 2,264 video-text pairs spanning 48 fine-grained unsafe categories.<n>To generate semantically accurate videos for safety evaluation, we design a controllable pipeline that decomposes video semantics into subject images and motion text.
arXiv Detail & Related papers (2025-05-17T05:06:38Z) - Protecting Young Users on Social Media: Evaluating the Effectiveness of Content Moderation and Legal Safeguards on Video Sharing Platforms [0.8198234257428011]
We evaluated the effectiveness of video moderation for different age groups on TikTok, YouTube, and Instagram.<n>For passive scrolling, accounts assigned to the age 13 group encountered videos that were deemed harmful more frequently and quickly than those assigned to the age 18 group.<n>Exposure occurred without user-initiated searches, indicating weaknesses in the algorithmic filtering systems.
arXiv Detail & Related papers (2025-05-16T12:06:42Z) - EdgeAIGuard: Agentic LLMs for Minor Protection in Digital Spaces [13.180252900900854]
We propose the EdgeAIGuard content moderation approach to protect minors from online grooming and various forms of digital exploitation.<n>The proposed method comprises a multi-agent architecture deployed strategically at the network edge to enable rapid detection with low latency and prevent harmful content targeting minors.
arXiv Detail & Related papers (2025-02-28T16:29:34Z) - Illusions of Relevance: Using Content Injection Attacks to Deceive Retrievers, Rerankers, and LLM Judges [52.96987928118327]
We find that embedding models for retrieval, rerankers, and large language model (LLM) relevance judges are vulnerable to content injection attacks.<n>We identify two primary threats: (1) inserting unrelated or harmful content within passages that still appear deceptively "relevant", and (2) inserting entire queries or key query terms into passages to boost their perceived relevance.<n>Our study systematically examines the factors that influence an attack's success, such as the placement of injected content and the balance between relevant and non-relevant material.
arXiv Detail & Related papers (2025-01-30T18:02:15Z) - MLLM-as-a-Judge for Image Safety without Human Labeling [81.24707039432292]
In the age of AI-generated content (AIGC), many image generation models are capable of producing harmful content.<n>It is crucial to identify such unsafe images based on established safety rules.<n>Existing approaches typically fine-tune MLLMs with human-labeled datasets.
arXiv Detail & Related papers (2024-12-31T00:06:04Z) - Jailbreak Vision Language Models via Bi-Modal Adversarial Prompt [60.54666043358946]
This paper introduces the Bi-Modal Adversarial Prompt Attack (BAP), which executes jailbreaks by optimizing textual and visual prompts cohesively.
In particular, we utilize a large language model to analyze jailbreak failures and employ chain-of-thought reasoning to refine textual prompts.
arXiv Detail & Related papers (2024-06-06T13:00:42Z) - Security Advice for Parents and Children About Content Filtering and
Circumvention as Found on YouTube and TikTok [2.743215038883957]
We examine the advice available to parents and children regarding content filtering and circumvention as found on YouTube and TikTok.
Our results show that of these videos, roughly three-quarters are accurate, with the remaining one-fourth containing factually incorrect advice.
We find that videos targeting children are both more likely to be incorrect and actionable than videos targeting parents, leaving children at increased risk of taking harmful action.
arXiv Detail & Related papers (2024-02-05T18:12:33Z) - Countering Malicious Content Moderation Evasion in Online Social
Networks: Simulation and Detection of Word Camouflage [64.78260098263489]
Twisting and camouflaging keywords are among the most used techniques to evade platform content moderation systems.
This article contributes significantly to countering malicious information by developing multilingual tools to simulate and detect new methods of evasion of content.
arXiv Detail & Related papers (2022-12-27T16:08:49Z) - Detecting and Understanding Harmful Memes: A Survey [48.135415967633676]
We offer a comprehensive survey with a focus on harmful memes.
One interesting finding is that many types of harmful memes are not really studied, e.g., such featuring self-harm and extremism.
Another observation is that memes can propagate globally through repackaging in different languages and that they can also be multilingual.
arXiv Detail & Related papers (2022-05-09T13:43:27Z) - 'Beach' to 'Bitch': Inadvertent Unsafe Transcription of Kids' Content on
YouTube [13.116806430326513]
Well-known automatic speech recognition (ASR) systems may produce text content highly inappropriate for kids while transcribing YouTube Kids' videos.
We release a first-of-its-kind data set of audios for which the existing state-of-the-art ASR systems hallucinate inappropriate content for kids.
arXiv Detail & Related papers (2022-02-17T19:19:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.