Related papers: Computable Gap Assessment of Artificial Intelligence Governance in Children's Centres: Evidence-Mechanism-Governance-Indicator Modelling of UNICEF's Guidance on AI and Children 3.0 Based on the Graph-GAP Framework

Computable Gap Assessment of Artificial Intelligence Governance in Children's Centres: Evidence-Mechanism-Governance-Indicator Modelling of UNICEF's Guidance on AI and Children 3.0 Based on the Graph-GAP Framework

URL: http://arxiv.org/abs/2601.04216v1
Date: Sat, 20 Dec 2025 17:03:17 GMT
Title: Computable Gap Assessment of Artificial Intelligence Governance in Children's Centres: Evidence-Mechanism-Governance-Indicator Modelling of UNICEF's Guidance on AI and Children 3.0 Based on the Graph-GAP Framework
Authors: Wei Meng,
Abstract summary: We propose a methodology that decomposes requirements from authoritative policy texts into a four layer graph of evidence, mechanism, governance, and indicator.<n>Using the UNICEF Innocenti Guidance on AI and Children 3.0 as primary material, we define reproducible extraction units, coding manuals, graph patterns, scoring scales, and consistency checks.<n>Results suggest that compared with privacy and data protection, requirements related to child well being and development, explainability and accountability, and cross agency implementation and resource allocation are more prone to indicator gaps and mechanism gaps.
Score: 5.260137087369841
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper tackles practical challenges in governing child centered artificial intelligence: policy texts state principles and requirements but often lack reproducible evidence anchors, explicit causal pathways, executable governance toolchains, and computable audit metrics. We propose Graph-GAP, a methodology that decomposes requirements from authoritative policy texts into a four layer graph of evidence, mechanism, governance, and indicator, and that computes two metrics, GAP score and mitigation readiness, to identify governance gaps and prioritise actions. Using the UNICEF Innocenti Guidance on AI and Children 3.0 as primary material, we define reproducible extraction units, coding manuals, graph patterns, scoring scales, and consistency checks, and we demonstrate exemplar gap profiles and governance priority matrices for ten requirements. Results suggest that compared with privacy and data protection, requirements related to child well being and development, explainability and accountability, and cross agency implementation and resource allocation are more prone to indicator gaps and mechanism gaps. We recommend translating requirements into auditable closed loop governance that integrates child rights impact assessments, continuous monitoring metrics, and grievance redress procedures. At the coding level, we introduce a multi algorithm review aggregation revision workflow that runs rule based encoders, statistical or machine learning evaluators, and large model evaluators with diverse prompt configurations as parallel coders. Each extraction unit outputs evidence, mechanism, governance, and indicator labels plus readiness scores with evidence anchors. Reliability, stability, and uncertainty are assessed using Krippendorff alpha, weighted kappa, intraclass correlation, and bootstrap confidence intervals.

Related papers

The Emergence of Lab-Driven Alignment Signatures: A Psychometric Framework for Auditing Latent Bias and Compounding Risk in Generative AI [0.0]
This paper introduces a novel auditing framework to quantify latent trait estimation under ordinal uncertainty.<n>The research audits nine leading models across dimensions including Optimization Bias, Sycophancy, and Status-Quo Legitimization.
arXiv Detail & Related papers (2026-02-19T06:56:01Z)
Autonomous Chain-of-Thought Distillation for Graph-Based Fraud Detection [73.9189065770752]
Graph-based fraud detection on text-attributed graphs (TAGs) requires jointly modeling rich textual semantics and relational dependencies.<n>We propose FraudCoT, a unified framework that advances TAG-based fraud detection through autonomous, graph-aware chain-of-thought (CoT) reasoning and scalable LLM-GNN co-training.
arXiv Detail & Related papers (2026-01-30T13:12:12Z)
Compliance as a Trust Metric [1.0264137858888513]
This paper bridges this research gap by operationalizing regulatory compliance as a quantitative and dynamic trust metric.<n>Our contribution is a quantitative model that assesses the severity of each violation along multiple dimensions, including its Volume, Duration, Breadth, and Criticality.<n>We evaluate ACE on a synthetic hospital dataset, demonstrating its ability to accurately detect a range of complex HIPAA and HIPAA violations.
arXiv Detail & Related papers (2026-01-03T21:14:40Z)
Measuring What Matters: The AI Pluralism Index [0.0]
We present the AI Pluralism Index (AIPI), a transparent, evidence-based instrument that evaluates producers and system families across four pillars: participatory governance, inclusivity and diversity, transparency, and accountability.<n>The index aims to steer incentives toward pluralistic practice and to equip policymakers, procurers, and the public with comparable evidence.
arXiv Detail & Related papers (2025-10-09T13:19:34Z)
Demystifying deep search: a holistic evaluation with hint-free multi-hop questions and factorised metrics [89.1999907891494]
We present WebDetective, a benchmark of hint-free multi-hop questions paired with a controlled Wikipedia sandbox.<n>Our evaluation of 25 state-of-the-art models reveals systematic weaknesses across all architectures.<n>We develop an agentic workflow, EvidenceLoop, that explicitly targets the challenges our benchmark identifies.
arXiv Detail & Related papers (2025-10-01T07:59:03Z)
AgenticIQA: An Agentic Framework for Adaptive and Interpretable Image Quality Assessment [69.06977852423564]
Image quality assessment (IQA) reflects both the quantification and interpretation of perceptual quality rooted in the human visual system.<n>AgenticIQA decomposes IQA into four subtasks -- distortion detection, distortion analysis, tool selection, and tool execution.<n>To support training and evaluation, we introduce AgenticIQA-200K, a large-scale instruction dataset tailored for IQA agents, and AgenticIQA-Eval, the first benchmark for assessing the planning, execution, and summarization capabilities of VLM-based IQA agents.
arXiv Detail & Related papers (2025-09-30T09:37:01Z)
Fair Deepfake Detectors Can Generalize [51.21167546843708]
We show that controlling for confounders (data distribution and model capacity) enables improved generalization via fairness interventions.<n>Motivated by this insight, we propose Demographic Attribute-insensitive Intervention Detection (DAID), a plug-and-play framework composed of: i) Demographic-aware data rebalancing, which employs inverse-propensity weighting and subgroup-wise feature normalization to neutralize distributional biases; and ii) Demographic-agnostic feature aggregation, which uses a novel alignment loss to suppress sensitive-attribute signals.<n>DAID consistently achieves superior performance in both fairness and generalization compared to several state-of-the-art
arXiv Detail & Related papers (2025-07-03T14:10:02Z)
KAQG: A Knowledge-Graph-Enhanced RAG for Difficulty-Controlled Question Generation [0.0]
This study introduces Knowledge Augmented Question Generation (KAQG)<n>It integrates Item Response Theory, abbreviated as IRT, Bloom's taxonomy, and knowledge graphs into a multi-agent Retrieval-Augmented Generation system.<n>The proposed approach overcomes limitations of existing methods by enabling fine-grained control over item difficulty, psychometric calibration, and cognitive alignment.
arXiv Detail & Related papers (2025-05-12T14:42:19Z)
Retrieval is Not Enough: Enhancing RAG Reasoning through Test-Time Critique and Optimization [58.390885294401066]
Retrieval-augmented generation (RAG) has become a widely adopted paradigm for enabling knowledge-grounded large language models (LLMs)<n>RAG pipelines often fail to ensure that model reasoning remains consistent with the evidence retrieved, leading to factual inconsistencies or unsupported conclusions.<n>We propose AlignRAG, a novel iterative framework grounded in Critique-Driven Alignment (CDA)<n>We introduce AlignRAG-auto, an autonomous variant that dynamically terminates refinement, removing the need to pre-specify the number of critique iterations.
arXiv Detail & Related papers (2025-04-21T04:56:47Z)
Demographic Benchmarking: Bridging Socio-Technical Gaps in Bias Detection [0.0]
This paper describes how the ITACA AI auditing platform tackles demographic benchmarking when auditing AI recommender systems.<n>The framework serves us as auditors as it allows us to not just measure but establish acceptability ranges for specific performance indicators.<n>Our approach integrates socio-demographic insights directly into AI systems, reducing bias and improving overall performance.
arXiv Detail & Related papers (2025-01-27T12:14:49Z)
Goodhart's Law Applies to NLP's Explanation Benchmarks [57.26445915212884]
We critically examine two sets of metrics: the ERASER metrics (comprehensiveness and sufficiency) and the EVAL-X metrics. We show that we can inflate a model's comprehensiveness and sufficiency scores dramatically without altering its predictions or explanations on in-distribution test inputs. Our results raise doubts about the ability of current metrics to guide explainability research, underscoring the need for a broader reassessment of what precisely these metrics are intended to capture.
arXiv Detail & Related papers (2023-08-28T03:03:03Z)
ROSCOE: A Suite of Metrics for Scoring Step-by-Step Reasoning [63.77667876176978]
Large language models show improved downstream task interpretability when prompted to generate step-by-step reasoning to justify their final answers. These reasoning steps greatly improve model interpretability and verification, but objectively studying their correctness is difficult. We present ROS, a suite of interpretable, unsupervised automatic scores that improve and extend previous text generation evaluation metrics.
arXiv Detail & Related papers (2022-12-15T15:52:39Z)
Multisource AI Scorecard Table for System Evaluation [3.74397577716445]
The paper describes a Multisource AI Scorecard Table (MAST) that provides the developer and user of an artificial intelligence (AI)/machine learning (ML) system with a standard checklist. The paper explores how the analytic tradecraft standards outlined in Intelligence Community Directive (ICD) 203 can provide a framework for assessing the performance of an AI system.
arXiv Detail & Related papers (2021-02-08T03:37:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.