Towards an Automated Framework to Audit Youth Safety on TikTok
- URL: http://arxiv.org/abs/2509.05838v1
- Date: Sat, 06 Sep 2025 21:30:11 GMT
- Title: Towards an Automated Framework to Audit Youth Safety on TikTok
- Authors: Linda Xue, Francesco Corso, Nicolo' Fontana, Geng Liu, Stefano Ceri, Francesco Pierri,
- Abstract summary: This paper investigates the effectiveness of TikTok's enforcement mechanisms for limiting the exposure of harmful content to youth accounts.<n>We collect over 7000 videos, classify them as harmful vs not-harmful, and then simulate interactions using age-specific sockpuppet accounts.<n>Preliminary results show minimal differences in content exposure between adult and youth accounts, raising concerns about the platform's age-based moderation.
- Score: 6.83645418303131
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper investigates the effectiveness of TikTok's enforcement mechanisms for limiting the exposure of harmful content to youth accounts. We collect over 7000 videos, classify them as harmful vs not-harmful, and then simulate interactions using age-specific sockpuppet accounts through both passive and active engagement strategies. We also evaluate the performance of large language (LLMs) and vision-language models (VLMs) in detecting harmful content, identifying key challenges in precision and scalability. Preliminary results show minimal differences in content exposure between adult and youth accounts, raising concerns about the platform's age-based moderation. These findings suggest that the platform needs to strengthen youth safety measures and improve transparency in content moderation.
Related papers
- Vision-Language Models Unlock Task-Centric Latent Actions [75.53481518882275]
We propose to utilize the common-sense reasoning abilities of Vision-Language Models (VLMs) to provide promptable representations.<n>We show that simply asking VLMs to ignore distractors can substantially improve latent action quality, yielding up to a six-fold increase in downstream success rates on Distracting MetaWorld.
arXiv Detail & Related papers (2026-01-30T08:38:59Z) - Friend or Foe: How LLMs' Safety Mind Gets Fooled by Intent Shift Attack [53.34204977366491]
Large language models (LLMs) remain vulnerable to jailbreaking attacks despite their impressive capabilities.<n>In this paper, we introduce ISA (Intent Shift Attack), which obfuscates LLMs about the intent of the attacks.<n>Our approach only needs minimal edits to the original request, and yields natural, human-readable, and seemingly harmless prompts.
arXiv Detail & Related papers (2025-11-01T13:44:42Z) - SproutBench: A Benchmark for Safe and Ethical Large Language Models for Youth [14.569766143989531]
The rapid proliferation of large language models (LLMs) in applications targeting children and adolescents necessitates a fundamental reassessment of prevailing AI safety frameworks.<n>This paper highlights key deficiencies in existing LLM safety benchmarks, including their inadequate coverage of age-specific cognitive, emotional, and social risks.<n>We introduce SproutBench, an innovative evaluation suite comprising 1,283 developmentally grounded adversarial prompts designed to probe risks such as emotional dependency, privacy violations, and imitation of hazardous behaviors.
arXiv Detail & Related papers (2025-08-14T18:21:39Z) - Catching Dark Signals in Algorithms: Unveiling Audiovisual and Thematic Markers of Unsafe Content Recommended for Children and Teenagers [13.39320891153433]
The prevalence of short form video platforms, combined with the ineffectiveness of age verification mechanisms, raises concerns about the potential harms facing children and teenagers in an algorithm-moderated online environment.<n>We conducted multimodal feature analysis and thematic topic modeling of 4,492 short videos recommended to children and teenagers on Instagram Reels, TikTok, and YouTube Shorts.<n>This feature-level and content-level analysis revealed that unsafe (i.e., problematic, mentally distressing) short videos possess darker visual features and contain explicitly harmful content and implicit harm from anxiety-inducing ordinary content.
arXiv Detail & Related papers (2025-07-16T18:41:42Z) - MinorBench: A hand-built benchmark for content-based risks for children [0.0]
Large Language Models (LLMs) are rapidly entering children's lives through parent-driven adoption, schools, and peer networks.<n>Current AI ethics and safety research do not adequately address content-related risks specific to minors.<n>We propose a new taxonomy of content-based risks for minors and introduce MinorBench, an open-source benchmark designed to evaluate LLMs on their ability to refuse unsafe or inappropriate queries from children.
arXiv Detail & Related papers (2025-03-13T10:34:43Z) - EdgeAIGuard: Agentic LLMs for Minor Protection in Digital Spaces [13.180252900900854]
We propose the EdgeAIGuard content moderation approach to protect minors from online grooming and various forms of digital exploitation.<n>The proposed method comprises a multi-agent architecture deployed strategically at the network edge to enable rapid detection with low latency and prevent harmful content targeting minors.
arXiv Detail & Related papers (2025-02-28T16:29:34Z) - Illusions of Relevance: Using Content Injection Attacks to Deceive Retrievers, Rerankers, and LLM Judges [52.96987928118327]
We find that embedding models for retrieval, rerankers, and large language model (LLM) relevance judges are vulnerable to content injection attacks.<n>We identify two primary threats: (1) inserting unrelated or harmful content within passages that still appear deceptively "relevant", and (2) inserting entire queries or key query terms into passages to boost their perceived relevance.<n>Our study systematically examines the factors that influence an attack's success, such as the placement of injected content and the balance between relevant and non-relevant material.
arXiv Detail & Related papers (2025-01-30T18:02:15Z) - Retention Score: Quantifying Jailbreak Risks for Vision Language Models [60.48306899271866]
Vision-Language Models (VLMs) are integrated with Large Language Models (LLMs) to enhance multi-modal machine learning capabilities.<n>This paper aims to assess the resilience of VLMs against jailbreak attacks that can compromise model safety compliance and result in harmful outputs.<n>To evaluate a VLM's ability to maintain its robustness against adversarial input perturbations, we propose a novel metric called the textbfRetention Score.
arXiv Detail & Related papers (2024-12-23T13:05:51Z) - TikGuard: A Deep Learning Transformer-Based Solution for Detecting Unsuitable TikTok Content for Kids [0.0]
This paper introduces TikGuard, a transformer-based deep learning approach aimed at detecting and flagging content unsuitable for children on TikTok.
By using a specially curated dataset, TikHarm, and leveraging advanced video classification techniques, TikGuard achieves an accuracy of 86.7%.
While direct comparisons are limited by the uniqueness of the TikHarm dataset, TikGuard's performance highlights its potential in enhancing content moderation.
arXiv Detail & Related papers (2024-10-01T05:00:05Z) - Towards Safety and Helpfulness Balanced Responses via Controllable Large Language Models [64.5204594279587]
A model that prioritizes safety will cause users to feel less engaged and assisted while prioritizing helpfulness will potentially cause harm.
We propose to balance safety and helpfulness in diverse use cases by controlling both attributes in large language models.
arXiv Detail & Related papers (2024-04-01T17:59:06Z) - A Survey on Detection of LLMs-Generated Content [97.87912800179531]
The ability to detect LLMs-generated content has become of paramount importance.
We aim to provide a detailed overview of existing detection strategies and benchmarks.
We also posit the necessity for a multi-faceted approach to defend against various attacks.
arXiv Detail & Related papers (2023-10-24T09:10:26Z) - LLM Censorship: A Machine Learning Challenge or a Computer Security
Problem? [52.71988102039535]
We show that semantic censorship can be perceived as an undecidable problem.
We argue that the challenges extend beyond semantic censorship, as knowledgeable attackers can reconstruct impermissible outputs.
arXiv Detail & Related papers (2023-07-20T09:25:02Z) - Countering Malicious Content Moderation Evasion in Online Social
Networks: Simulation and Detection of Word Camouflage [64.78260098263489]
Twisting and camouflaging keywords are among the most used techniques to evade platform content moderation systems.
This article contributes significantly to countering malicious information by developing multilingual tools to simulate and detect new methods of evasion of content.
arXiv Detail & Related papers (2022-12-27T16:08:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.