Related papers: Evaluating LLM Safety Across Child Development Stages: A Simulated Agent Approach

Evaluating LLM Safety Across Child Development Stages: A Simulated Agent Approach

URL: http://arxiv.org/abs/2510.05484v1
Date: Tue, 07 Oct 2025 01:01:04 GMT
Title: Evaluating LLM Safety Across Child Development Stages: A Simulated Agent Approach
Authors: Abhejay Murali, Saleh Afroogh, Kevin Chen, David Atkinson, Amit Dhurandhar, Junfeng Jiao,
Abstract summary: We present ChildSafe, a benchmark that evaluates Large Language Models (LLMs) safety through simulated child agents.<n>ChildSafe assesses responses across nine safety dimensions using age-weighted scoring in both sensitive and neutral contexts.
Score: 9.544657426086284
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) are rapidly becoming part of tools used by children; however, existing benchmarks fail to capture how these models manage language, reasoning, and safety needs that are specific to various ages. We present ChildSafe, a benchmark that evaluates LLM safety through simulated child agents that embody four developmental stages. These agents, grounded in developmental psychology, enable a systematic study of child safety without the ethical implications of involving real children. ChildSafe assesses responses across nine safety dimensions (including privacy, misinformation, and emotional support) using age-weighted scoring in both sensitive and neutral contexts. Multi-turn experiments with multiple LLMs uncover consistent vulnerabilities that vary by simulated age, exposing shortcomings in existing alignment practices. By releasing agent templates, evaluation protocols, and an experimental corpus, we provide a reproducible framework for age-aware safety research. We encourage the community to expand this work with real child-centered data and studies, advancing the development of LLMs that are genuinely safe and developmentally aligned.

Related papers

SafeSci: Safety Evaluation of Large Language Models in Science Domains and Beyond [134.43113804188195]
We introduce SafeSci, a comprehensive framework for safety evaluation and enhancement in scientific contexts.<n>SafeSci comprises SafeSciBench, a multi-disciplinary benchmark with 0.25M samples, and SafeSciTrain, a large-scale dataset containing 1.5M samples for safety enhancement.
arXiv Detail & Related papers (2026-03-02T08:16:04Z)
SproutBench: A Benchmark for Safe and Ethical Large Language Models for Youth [14.569766143989531]
The rapid proliferation of large language models (LLMs) in applications targeting children and adolescents necessitates a fundamental reassessment of prevailing AI safety frameworks.<n>This paper highlights key deficiencies in existing LLM safety benchmarks, including their inadequate coverage of age-specific cognitive, emotional, and social risks.<n>We introduce SproutBench, an innovative evaluation suite comprising 1,283 developmentally grounded adversarial prompts designed to probe risks such as emotional dependency, privacy violations, and imitation of hazardous behaviors.
arXiv Detail & Related papers (2025-08-14T18:21:39Z)
Safe-Child-LLM: A Developmental Benchmark for Evaluating LLM Safety in Child-LLM Interactions [8.018569128518187]
We introduce Safe-Child-LLM, a benchmark and dataset for assessing AI safety across two developmental stages: children (7-12) and adolescents (13-17).<n>Our framework includes a novel multi-part dataset of 200 adversarial prompts, curated from red-teaming corpora, with human-annotated labels for jailbreak success and a standardized 0-5 ethical refusal scale.<n> evaluating leading LLMs -- including ChatGPT, Claude, Gemini, LLaMA, DeepSeek, Grok, Vicuna, and Mistral -- we uncover critical safety deficiencies in child-facing scenarios.
arXiv Detail & Related papers (2025-06-16T14:04:54Z)
AgentAuditor: Human-Level Safety and Security Evaluation for LLM Agents [48.925168866726814]
AgentAuditor is a universal, training-free, memory-augmented reasoning framework.<n>ASSEBench is the first benchmark designed to check how well LLM-based evaluators can spot both safety risks and security threats.
arXiv Detail & Related papers (2025-05-31T17:10:23Z)
MinorBench: A hand-built benchmark for content-based risks for children [0.0]
Large Language Models (LLMs) are rapidly entering children's lives through parent-driven adoption, schools, and peer networks.<n>Current AI ethics and safety research do not adequately address content-related risks specific to minors.<n>We propose a new taxonomy of content-based risks for minors and introduce MinorBench, an open-source benchmark designed to evaluate LLMs on their ability to refuse unsafe or inappropriate queries from children.
arXiv Detail & Related papers (2025-03-13T10:34:43Z)
LLM Safety for Children [9.935219917903858]
The study acknowledges the diverse nature of children often overlooked by standard safety evaluations.<n>We develop Child User Models that reflect the varied personalities and interests of children informed by literature in child care and psychology.
arXiv Detail & Related papers (2025-02-18T05:26:27Z)
LLMs and Childhood Safety: Identifying Risks and Proposing a Protection Framework for Safe Child-LLM Interaction [8.018569128518187]
This study examines the growing use of Large Language Models (LLMs) in child-centered applications.<n>It highlights safety and ethical concerns such as bias, harmful content, and cultural insensitivity.<n>We propose a protection framework for safe Child-LLM interaction, incorporating metrics for content safety, behavioral ethics, and cultural sensitivity.
arXiv Detail & Related papers (2025-02-16T19:39:48Z)
Multimodal Situational Safety [73.63981779844916]
We present the first evaluation and analysis of a novel safety challenge termed Multimodal Situational Safety.<n>For an MLLM to respond safely, whether through language or action, it often needs to assess the safety implications of a language query within its corresponding visual context.<n>We develop the Multimodal Situational Safety benchmark (MSSBench) to assess the situational safety performance of current MLLMs.
arXiv Detail & Related papers (2024-10-08T16:16:07Z)
ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming [64.86326523181553]
ALERT is a large-scale benchmark to assess safety based on a novel fine-grained risk taxonomy. It aims to identify vulnerabilities, inform improvements, and enhance the overall safety of the language models.
arXiv Detail & Related papers (2024-04-06T15:01:47Z)
The Art of Defending: A Systematic Evaluation and Analysis of LLM Defense Strategies on Safety and Over-Defensiveness [56.174255970895466]
Large Language Models (LLMs) play an increasingly pivotal role in natural language processing applications. This paper presents Safety and Over-Defensiveness Evaluation (SODE) benchmark.
arXiv Detail & Related papers (2023-12-30T17:37:06Z)
Safety Assessment of Chinese Large Language Models [51.83369778259149]
Large language models (LLMs) may generate insulting and discriminatory content, reflect incorrect social values, and may be used for malicious purposes. To promote the deployment of safe, responsible, and ethical AI, we release SafetyPrompts including 100k augmented prompts and responses by LLMs.
arXiv Detail & Related papers (2023-04-20T16:27:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.