Related papers: Auction-Based Regulation for Artificial Intelligence

Related papers

Security Challenges in AI Agent Deployment: Insights from a Large Scale Public Competition [101.86739402748995]
We run the largest public red-teaming competition to date, targeting 22 frontier AI agents across 44 realistic deployment scenarios.<n>We build the Agent Red Teaming benchmark and evaluate it across 19 state-of-the-art models.<n>Our findings highlight critical and persistent vulnerabilities in today's AI agents.
arXiv Detail & Related papers (2025-07-28T05:13:04Z)
Resource Rational Contractualism Should Guide AI Alignment [69.07915246220985]
Contractualist alignment proposes grounding decisions in agreements that diverse stakeholders would endorse.<n>We propose Resource-Rationalism: a framework where AI systems approximate the agreements rational parties would form.<n>An RRC-aligned agent would not only operate efficiently, but also be equipped to dynamically adapt to and interpret the ever-changing human social world.
arXiv Detail & Related papers (2025-06-20T18:57:13Z)
Watermarking Without Standards Is Not AI Governance [46.71493672772134]
We argue that current implementations risk serving as symbolic compliance rather than delivering effective oversight.<n>We propose a three-layer framework encompassing technical standards, audit infrastructure, and enforcement mechanisms.
arXiv Detail & Related papers (2025-05-27T18:10:04Z)
Do LLMs trust AI regulation? Emerging behaviour of game-theoretic LLM agents [61.132523071109354]
This paper investigates the interplay between AI developers, regulators and users, modelling their strategic choices under different regulatory scenarios. Our research identifies emerging behaviours of strategic AI agents, which tend to adopt more "pessimistic" stances than pure game-theoretic agents.
arXiv Detail & Related papers (2025-04-11T15:41:21Z)
Media and responsible AI governance: a game-theoretic and LLM analysis [61.132523071109354]
This paper investigates the interplay between AI developers, regulators, users, and the media in fostering trustworthy AI systems. Using evolutionary game theory and large language models (LLMs), we model the strategic interactions among these actors under different regulatory regimes.
arXiv Detail & Related papers (2025-03-12T21:39:38Z)
Beyond Benchmarks: On The False Promise of AI Regulation [13.125853211532196]
We show that effective scientific regulation requires a causal theory linking observable test outcomes to future performance. We show that deep learning models, which learn complex statistical patterns from training data without explicit causal mechanisms, preclude such guarantees.
arXiv Detail & Related papers (2025-01-26T22:43:07Z)
Using AI Alignment Theory to understand the potential pitfalls of regulatory frameworks [55.2480439325792]
This paper critically examines the European Union's Artificial Intelligence Act (EU AI Act) Uses insights from Alignment Theory (AT) research, which focuses on the potential pitfalls of technical alignment in Artificial Intelligence. As we apply these concepts to the EU AI Act, we uncover potential vulnerabilities and areas for improvement in the regulation.
arXiv Detail & Related papers (2024-10-10T17:38:38Z)
Safeguarding AI Agents: Developing and Analyzing Safety Architectures [0.0]
This paper addresses the need for safety measures in AI systems that collaborate with human teams. We propose and evaluate three frameworks to enhance safety protocols in AI agent systems. We conclude that these frameworks can significantly strengthen the safety and security of AI agent systems.
arXiv Detail & Related papers (2024-09-03T10:14:51Z)
Nothing in Excess: Mitigating the Exaggerated Safety for LLMs via Safety-Conscious Activation Steering [56.92068213969036]
Safety alignment is indispensable for Large language models (LLMs) to defend threats from malicious instructions. Recent researches reveal safety-aligned LLMs prone to reject benign queries due to the exaggerated safety issue. We propose a Safety-Conscious Activation Steering (SCANS) method to mitigate the exaggerated safety concerns.
arXiv Detail & Related papers (2024-08-21T10:01:34Z)
The Dilemma of Uncertainty Estimation for General Purpose AI in the EU AI Act [6.9060054915724]
The AI act is the European Union-wide regulation of AI systems. We argue that uncertainty estimation should be a required component for deploying models in the real world.
arXiv Detail & Related papers (2024-08-20T23:59:51Z)
Certified Safe: A Schematic for Approval Regulation of Frontier AI [0.0]
An approval regulation scheme is one in which a firm cannot legally market, or in some cases develop, a product without explicit approval from a regulator. This report proposes an approval regulation schematic for only the largest AI projects in which scrutiny begins before training and continues through to post-deployment monitoring.
arXiv Detail & Related papers (2024-08-12T15:01:03Z)
EAIRiskBench: Towards Evaluating Physical Risk Awareness for Task Planning of Foundation Model-based Embodied AI Agents [47.69642609574771]
Embodied artificial intelligence (EAI) integrates advanced AI models into physical entities for real-world interaction. Foundation models as the "brain" of EAI agents for high-level task planning have shown promising results. However, the deployment of these agents in physical environments presents significant safety challenges. This study introduces EAIRiskBench, a novel framework for automated physical risk assessment in EAI scenarios.
arXiv Detail & Related papers (2024-08-08T13:19:37Z)
An FDA for AI? Pitfalls and Plausibility of Approval Regulation for Frontier Artificial Intelligence [0.0]
We explore the applicability of approval regulation -- that is, regulation of a product that combines experimental minima with government licensure conditioned partially or fully upon that experimentation -- to the regulation of frontier AI. There are a number of reasons to believe that approval regulation, simplistically applied, would be inapposite for frontier AI risks. We conclude by highlighting the role of policy learning and experimentation in regulatory development.
arXiv Detail & Related papers (2024-08-01T17:54:57Z)
SafeAligner: Safety Alignment against Jailbreak Attacks via Response Disparity Guidance [48.80398992974831]
SafeAligner is a methodology implemented at the decoding stage to fortify defenses against jailbreak attacks. We develop two specialized models: the Sentinel Model, which is trained to foster safety, and the Intruder Model, designed to generate riskier responses. We show that SafeAligner can increase the likelihood of beneficial tokens, while reducing the occurrence of harmful ones.
arXiv Detail & Related papers (2024-06-26T07:15:44Z)
Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems [88.80306881112313]
We will introduce and define a family of approaches to AI safety, which we will refer to as guaranteed safe (GS) AI. The core feature of these approaches is that they aim to produce AI systems which are equipped with high-assurance quantitative safety guarantees. We outline a number of approaches for creating each of these three core components, describe the main technical challenges, and suggest a number of potential solutions to them.
arXiv Detail & Related papers (2024-05-10T17:38:32Z)
The risks of risk-based AI regulation: taking liability seriously [46.90451304069951]
The development and regulation of AI seems to have reached a critical stage. Some experts are calling for a moratorium on the training of AI systems more powerful than GPT-4. This paper analyses the most advanced legal proposal, the European Union's AI Act.
arXiv Detail & Related papers (2023-11-03T12:51:37Z)
ASSERT: Automated Safety Scenario Red Teaming for Evaluating the Robustness of Large Language Models [65.79770974145983]
ASSERT, Automated Safety Scenario Red Teaming, consists of three methods -- semantically aligned augmentation, target bootstrapping, and adversarial knowledge injection. We partition our prompts into four safety domains for a fine-grained analysis of how the domain affects model performance. We find statistically significant performance differences of up to 11% in absolute classification accuracy among semantically related scenarios and error rates of up to 19% absolute error in zero-shot adversarial settings.
arXiv Detail & Related papers (2023-10-14T17:10:28Z)
Frontier AI Regulation: Managing Emerging Risks to Public Safety [15.85618115026625]
"Frontier AI" models could possess dangerous capabilities sufficient to pose severe risks to public safety. Industry self-regulation is an important first step. We propose an initial set of safety standards.
arXiv Detail & Related papers (2023-07-06T17:03:25Z)
Both eyes open: Vigilant Incentives help Regulatory Markets improve AI Safety [69.59465535312815]
Regulatory Markets for AI is a proposal designed with adaptability in mind. It involves governments setting outcome-based targets for AI companies to achieve. We warn that it is alarmingly easy to stumble on incentives which would prevent Regulatory Markets from achieving this goal.
arXiv Detail & Related papers (2023-03-06T14:42:05Z)
Regulating ChatGPT and other Large Generative AI Models [0.0]
Large generative AI models (LGAIMs) are rapidly transforming the way we communicate, illustrate, and create. This paper will situate these new generative models in the current debate on trustworthy AI regulation. It suggests a novel terminology to capture the AI value chain in LGAIM settings.
arXiv Detail & Related papers (2023-02-05T08:56:45Z)
Automatic Rule Induction for Efficient Semi-Supervised Learning [56.91428251227253]
Semi-supervised learning has shown promise in allowing NLP models to generalize from small amounts of labeled data. Pretrained transformer models act as black-box correlation engines that are difficult to explain and sometimes behave unreliably. We propose tackling both of these challenges via Automatic Rule Induction (ARI), a simple and general-purpose framework.
arXiv Detail & Related papers (2022-05-18T16:50:20Z)
Constraints Satisfiability Driven Reinforcement Learning for Autonomous Cyber Defense [7.321728608775741]
We present a new hybrid autonomous agent architecture that aims to optimize and verify defense policies of reinforcement learning (RL) We use constraints verification (using satisfiability modulo theory (SMT)) to steer the RL decision-making toward safe and effective actions. Our evaluation of the presented approach in a simulated CPS environment shows that the agent learns the optimal policy fast and defeats diversified attack strategies in 99% cases.
arXiv Detail & Related papers (2021-04-19T01:08:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.