Related papers: Safe-Child-LLM: A Developmental Benchmark for Evaluating LLM Safety in Child-LLM Interactions

Safe-Child-LLM: A Developmental Benchmark for Evaluating LLM Safety in Child-LLM Interactions

URL: http://arxiv.org/abs/2506.13510v2
Date: Tue, 17 Jun 2025 02:13:08 GMT
Title: Safe-Child-LLM: A Developmental Benchmark for Evaluating LLM Safety in Child-LLM Interactions
Authors: Junfeng Jiao, Saleh Afroogh, Kevin Chen, Abhejay Murali, David Atkinson, Amit Dhurandhar,
Abstract summary: We introduce Safe-Child-LLM, a benchmark and dataset for assessing AI safety across two developmental stages: children (7-12) and adolescents (13-17).<n>Our framework includes a novel multi-part dataset of 200 adversarial prompts, curated from red-teaming corpora, with human-annotated labels for jailbreak success and a standardized 0-5 ethical refusal scale.<n> evaluating leading LLMs -- including ChatGPT, Claude, Gemini, LLaMA, DeepSeek, Grok, Vicuna, and Mistral -- we uncover critical safety deficiencies in child-facing scenarios.
Score: 8.018569128518187
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As Large Language Models (LLMs) increasingly power applications used by children and adolescents, ensuring safe and age-appropriate interactions has become an urgent ethical imperative. Despite progress in AI safety, current evaluations predominantly focus on adults, neglecting the unique vulnerabilities of minors engaging with generative AI. We introduce Safe-Child-LLM, a comprehensive benchmark and dataset for systematically assessing LLM safety across two developmental stages: children (7-12) and adolescents (13-17). Our framework includes a novel multi-part dataset of 200 adversarial prompts, curated from red-teaming corpora (e.g., SG-Bench, HarmBench), with human-annotated labels for jailbreak success and a standardized 0-5 ethical refusal scale. Evaluating leading LLMs -- including ChatGPT, Claude, Gemini, LLaMA, DeepSeek, Grok, Vicuna, and Mistral -- we uncover critical safety deficiencies in child-facing scenarios. This work highlights the need for community-driven benchmarks to protect young users in LLM interactions. To promote transparency and collaborative advancement in ethical AI development, we are publicly releasing both our benchmark datasets and evaluation codebase at https://github.com/The-Responsible-AI-Initiative/Safe_Child_LLM_Benchmark.git

Related papers

Probing AI Safety with Source Code [49.39895512792655]
Large language models (LLMs) have become ubiquitous, interfacing with humans in numerous safety-critical applications.<n>We demonstrate that contemporary models fall concerningly short of the goal of AI safety, leading to an unsafe and harmful experience for users.
arXiv Detail & Related papers (2025-06-25T14:19:57Z)
ROSE: Toward Reality-Oriented Safety Evaluation of Large Language Models [60.28667314609623]
Large Language Models (LLMs) are increasingly deployed as black-box components in real-world applications.<n>We propose Reality-Oriented Safety Evaluation (ROSE), a novel framework that uses multi-objective reinforcement learning to fine-tune an adversarial LLM.
arXiv Detail & Related papers (2025-06-17T10:55:17Z)
AgentAuditor: Human-Level Safety and Security Evaluation for LLM Agents [41.000042817113645]
sys is a universal, training-free, memory-augmented reasoning framework.<n>sys constructs an experiential memory by having an LLM adaptively extract structured semantic features.<n>data is the first benchmark designed to check how well LLM-based evaluators can spot both safety risks and security threats.
arXiv Detail & Related papers (2025-05-31T17:10:23Z)
MinorBench: A hand-built benchmark for content-based risks for children [0.0]
Large Language Models (LLMs) are rapidly entering children's lives through parent-driven adoption, schools, and peer networks.<n>Current AI ethics and safety research do not adequately address content-related risks specific to minors.<n>We propose a new taxonomy of content-based risks for minors and introduce MinorBench, an open-source benchmark designed to evaluate LLMs on their ability to refuse unsafe or inappropriate queries from children.
arXiv Detail & Related papers (2025-03-13T10:34:43Z)
LLM Safety for Children [9.935219917903858]
The study acknowledges the diverse nature of children often overlooked by standard safety evaluations.<n>We develop Child User Models that reflect the varied personalities and interests of children informed by literature in child care and psychology.
arXiv Detail & Related papers (2025-02-18T05:26:27Z)
LLMs and Childhood Safety: Identifying Risks and Proposing a Protection Framework for Safe Child-LLM Interaction [8.018569128518187]
This study examines the growing use of Large Language Models (LLMs) in child-centered applications.<n>It highlights safety and ethical concerns such as bias, harmful content, and cultural insensitivity.<n>We propose a protection framework for safe Child-LLM interaction, incorporating metrics for content safety, behavioral ethics, and cultural sensitivity.
arXiv Detail & Related papers (2025-02-16T19:39:48Z)
ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming [64.86326523181553]
ALERT is a large-scale benchmark to assess safety based on a novel fine-grained risk taxonomy. It aims to identify vulnerabilities, inform improvements, and enhance the overall safety of the language models.
arXiv Detail & Related papers (2024-04-06T15:01:47Z)
SALAD-Bench: A Hierarchical and Comprehensive Safety Benchmark for Large Language Models [107.82336341926134]
SALAD-Bench is a safety benchmark specifically designed for evaluating Large Language Models (LLMs) It transcends conventional benchmarks through its large scale, rich diversity, intricate taxonomy spanning three levels, and versatile functionalities.
arXiv Detail & Related papers (2024-02-07T17:33:54Z)
Flames: Benchmarking Value Alignment of LLMs in Chinese [86.73527292670308]
This paper proposes a value alignment benchmark named Flames. It encompasses both common harmlessness principles and a unique morality dimension that integrates specific Chinese values. Our findings indicate that all the evaluated LLMs demonstrate relatively poor performance on Flames.
arXiv Detail & Related papers (2023-11-12T17:18:21Z)
Safety Assessment of Chinese Large Language Models [51.83369778259149]
Large language models (LLMs) may generate insulting and discriminatory content, reflect incorrect social values, and may be used for malicious purposes. To promote the deployment of safe, responsible, and ethical AI, we release SafetyPrompts including 100k augmented prompts and responses by LLMs.
arXiv Detail & Related papers (2023-04-20T16:27:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.