Supervision policies can shape long-term risk management in general-purpose AI models
- URL: http://arxiv.org/abs/2501.06137v1
- Date: Fri, 10 Jan 2025 17:52:34 GMT
- Title: Supervision policies can shape long-term risk management in general-purpose AI models
- Authors: Manuel Cebrian, Emilia Gomez, David Fernandez Llorca,
- Abstract summary: We develop a simulation framework parameterized by features extracted from the diverse landscape of risk, incident, or hazard reporting ecosystems.<n>We evaluate four supervision policies: non-prioritized (first-come, first-served), random selection, priority-based (addressing the highest-priority risks first), and diversity-prioritized (balancing high-priority risks with comprehensive coverage across risk types)<n>Our results indicate that while priority-based and diversity-prioritized policies are more effective at mitigating high-impact risks, they may inadvertently neglect systemic issues reported by the broader community.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The rapid proliferation and deployment of General-Purpose AI (GPAI) models, including large language models (LLMs), present unprecedented challenges for AI supervisory entities. We hypothesize that these entities will need to navigate an emergent ecosystem of risk and incident reporting, likely to exceed their supervision capacity. To investigate this, we develop a simulation framework parameterized by features extracted from the diverse landscape of risk, incident, or hazard reporting ecosystems, including community-driven platforms, crowdsourcing initiatives, and expert assessments. We evaluate four supervision policies: non-prioritized (first-come, first-served), random selection, priority-based (addressing the highest-priority risks first), and diversity-prioritized (balancing high-priority risks with comprehensive coverage across risk types). Our results indicate that while priority-based and diversity-prioritized policies are more effective at mitigating high-impact risks, particularly those identified by experts, they may inadvertently neglect systemic issues reported by the broader community. This oversight can create feedback loops that amplify certain types of reporting while discouraging others, leading to a skewed perception of the overall risk landscape. We validate our simulation results with several real-world datasets, including one with over a million ChatGPT interactions, of which more than 150,000 conversations were identified as risky. This validation underscores the complex trade-offs inherent in AI risk supervision and highlights how the choice of risk management policies can shape the future landscape of AI risks across diverse GPAI models used in society.
Related papers
- Adapting Probabilistic Risk Assessment for AI [0.0]
General-purpose artificial intelligence (AI) systems present an urgent risk management challenge.
Current methods often rely on selective testing and undocumented assumptions about risk priorities.
This paper introduces the probabilistic risk assessment (PRA) for AI framework.
arXiv Detail & Related papers (2025-04-25T17:59:14Z) - Multi-Agent Risks from Advanced AI [90.74347101431474]
Multi-agent systems of advanced AI pose novel and under-explored risks.
We identify three key failure modes based on agents' incentives, as well as seven key risk factors.
We highlight several important instances of each risk, as well as promising directions to help mitigate them.
arXiv Detail & Related papers (2025-02-19T23:03:21Z) - AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommons [62.374792825813394]
This paper introduces AILuminate v1.0, the first comprehensive industry-standard benchmark for assessing AI-product risk and reliability.
The benchmark evaluates an AI system's resistance to prompts designed to elicit dangerous, illegal, or undesirable behavior in 12 hazard categories.
arXiv Detail & Related papers (2025-02-19T05:58:52Z) - A Taxonomy of Systemic Risks from General-Purpose AI [2.5956465292067867]
We consider systemic risks as large-scale threats that can affect entire societies or economies.<n>Key sources of systemic risk emerge from knowledge gaps, challenges in recognizing harm, and the unpredictable trajectory of AI development.<n>This paper contributes to AI safety research by providing a structured groundwork for understanding and addressing the potential large-scale negative societal impacts of general-purpose AI.
arXiv Detail & Related papers (2024-11-24T22:16:18Z) - Effective Mitigations for Systemic Risks from General-Purpose AI [9.39718128736321]
We surveyed 76 experts whose expertise spans AI safety; critical infrastructure; democratic processes; chemical, biological, radiological, and nuclear risks (CBRN); and discrimination and bias.<n>We find that a broad range of risk mitigation measures are perceived as effective in reducing various systemic risks and technically feasible by domain experts.<n>Three mitigation measures stand out: safety incident reports and security information sharing, third-party pre-deployment model audits, and pre-deployment risk assessments.
arXiv Detail & Related papers (2024-11-14T22:39:25Z) - Navigating the Risks: A Survey of Security, Privacy, and Ethics Threats in LLM-Based Agents [67.07177243654485]
This survey collects and analyzes the different threats faced by large language models-based agents.
We identify six key features of LLM-based agents, based on which we summarize the current research progress.
We select four representative agents as case studies to analyze the risks they may face in practical use.
arXiv Detail & Related papers (2024-11-14T15:40:04Z) - EARBench: Towards Evaluating Physical Risk Awareness for Task Planning of Foundation Model-based Embodied AI Agents [53.717918131568936]
Embodied artificial intelligence (EAI) integrates advanced AI models into physical entities for real-world interaction.<n>Foundation models as the "brain" of EAI agents for high-level task planning have shown promising results.<n>However, the deployment of these agents in physical environments presents significant safety challenges.<n>This study introduces EARBench, a novel framework for automated physical risk assessment in EAI scenarios.
arXiv Detail & Related papers (2024-08-08T13:19:37Z) - AI Risk Categorization Decoded (AIR 2024): From Government Regulations to Corporate Policies [88.32153122712478]
We identify 314 unique risk categories organized into a four-tiered taxonomy.
At the highest level, this taxonomy encompasses System & Operational Risks, Content Safety Risks, Societal Risks, and Legal & Rights Risks.
We aim to advance AI safety through information sharing across sectors and the promotion of best practices in risk mitigation for generative AI models and systems.
arXiv Detail & Related papers (2024-06-25T18:13:05Z) - RiskQ: Risk-sensitive Multi-Agent Reinforcement Learning Value Factorization [49.26510528455664]
We introduce the Risk-sensitive Individual-Global-Max (RIGM) principle as a generalization of the Individual-Global-Max (IGM) and Distributional IGM (DIGM) principles.
We show that RiskQ can obtain promising performance through extensive experiments.
arXiv Detail & Related papers (2023-11-03T07:18:36Z) - Quantitative AI Risk Assessments: Opportunities and Challenges [7.35411010153049]
Best way to reduce risks is to implement comprehensive AI lifecycle governance.<n>Risks can be quantified using metrics from the technical community.<n>This paper explores these issues, focusing on the opportunities, challenges, and potential impacts of such an approach.
arXiv Detail & Related papers (2022-09-13T21:47:25Z) - Sample-Based Bounds for Coherent Risk Measures: Applications to Policy
Synthesis and Verification [32.9142708692264]
This paper aims to address a few problems regarding risk-aware verification and policy synthesis.
First, we develop a sample-based method to evaluate a subset of a random variable distribution.
Second, we develop a robotic-based method to determine solutions to problems that outperform a large fraction of the decision space.
arXiv Detail & Related papers (2022-04-21T01:06:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.