Domain-Agnostic Scalable AI Safety Ensuring Framework
- URL: http://arxiv.org/abs/2504.20924v6
- Date: Sun, 05 Oct 2025 09:30:33 GMT
- Title: Domain-Agnostic Scalable AI Safety Ensuring Framework
- Authors: Beomjun Kim, Kangyeon Kim, Sunwoo Kim, Yeonsang Shin, Heejin Ahn,
- Abstract summary: We propose the first domain-agnostic AI safety framework that achieves strong safety guarantees while preserving high performance.<n>Our framework includes: (1) an optimization component with chance constraints, (2) a safety classification model, (3) internal test data, (4) conservative testing procedures, (5) informative dataset quality measures, and (6) continuous approximate loss functions with gradient gradient.
- Score: 6.421238475415244
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: AI safety has emerged as a critical priority as these systems are increasingly deployed in real-world applications. We propose the first domain-agnostic AI safety ensuring framework that achieves strong safety guarantees while preserving high performance, grounded in rigorous theoretical foundations. Our framework includes: (1) an optimization component with chance constraints, (2) a safety classification model, (3) internal test data, (4) conservative testing procedures, (5) informative dataset quality measures, and (6) continuous approximate loss functions with gradient computation. Furthermore, to our knowledge, we mathematically establish the first scaling law in AI safety research, relating data quantity to safety-performance trade-offs. Experiments across reinforcement learning, natural language generation, and production planning validate our framework and demonstrate superior performance. Notably, in reinforcement learning, we achieve 3 collisions during 10M actions, compared with 1,000-3,000 for PPO-Lag baselines at equivalent performance levels -- a safety level unattainable by previous AI methods. We believe our framework opens a new foundation for safe AI deployment across safety-critical domains.
Related papers
- From Leaderboard to Deployment: Code Quality Challenges in AV Perception Repositories [4.603321798937855]
This study systematically analyzed 178 unique models from the KITTI and NuScenes 3D Object Detection leaderboards.<n>Our findings revealed that only 7.3% of the studied repositories meet basic production-readiness criteria.<n>The adoption of Continuous Integration/Continuous Deployment pipelines was correlated with better code maintainability.
arXiv Detail & Related papers (2026-03-02T18:54:28Z) - DeepKnown-Guard: A Proprietary Model-Based Safety Response Framework for AI Agents [12.054307827384415]
Large Language Models (LLMs) have become increasingly prominent, severely constraining their trustworthy deployment in critical domains.<n>This paper proposes a novel safety response framework designed to safeguard LLMs at both the input and output levels.
arXiv Detail & Related papers (2025-11-05T03:04:35Z) - SafeWork-R1: Coevolving Safety and Intelligence under the AI-45$^{\circ}$ Law [91.33824439029533]
We introduce SafeWork-R1, a cutting-edge multimodal reasoning model that demonstrates the coevolution of capabilities and safety.<n>It is developed by our proposed SafeLadder framework, which incorporates large-scale, progressive, safety-oriented reinforcement learning post-training.<n>We further develop SafeWork-R1-InternVL3-78B, SafeWork-R1-DeepSeek-70B, and SafeWork-R1-Qwen2.5VL-7B.
arXiv Detail & Related papers (2025-07-24T16:49:19Z) - Towards provable probabilistic safety for scalable embodied AI systems [79.31011047593492]
Embodied AI systems are increasingly prevalent across various applications.<n> Ensuring their safety in complex operating environments remains a major challenge.<n>This Perspective offers a pathway toward safer, large-scale adoption of embodied AI systems in safety-critical applications.
arXiv Detail & Related papers (2025-06-05T15:46:25Z) - Feasibility-Aware Pessimistic Estimation: Toward Long-Horizon Safety in Offline RL [14.767273209148545]
We propose a novel framework Feasibility-Aware offline Safe Reinforcement Learning with CVAE-based Pessimism (FASP)<n>We employ Hamilton-Jacobi (H-J) reachability analysis to generate reliable safety labels.<n>We also utilize pessimistic estimation methods to estimate the Q-value of reward and cost.
arXiv Detail & Related papers (2025-05-13T02:32:49Z) - AISafetyLab: A Comprehensive Framework for AI Safety Evaluation and Improvement [73.0700818105842]
We introduce AISafetyLab, a unified framework and toolkit that integrates representative attack, defense, and evaluation methodologies for AI safety.
AISafetyLab features an intuitive interface that enables developers to seamlessly apply various techniques.
We conduct empirical studies on Vicuna, analyzing different attack and defense strategies to provide valuable insights into their comparative effectiveness.
arXiv Detail & Related papers (2025-02-24T02:11:52Z) - A Physics-Informed Machine Learning Framework for Safe and Optimal Control of Autonomous Systems [8.347548017994178]
Safety and performance could be competing objectives, which makes their co-optimization difficult.<n>We propose a state-constrained optimal control problem, where performance objectives are encoded via a cost function and safety requirements are imposed as state constraints.<n>We demonstrate that the resultant value function satisfies a Hamilton-Jacobi-Bellman equation, which we approximate efficiently using a novel machine learning framework.
arXiv Detail & Related papers (2025-02-16T09:46:17Z) - Assessing confidence in frontier AI safety cases [37.839615078345886]
A safety case presents a structured argument in support of a top-level claim about a safety property of the system.<n>This raises the question of what level of confidence should be associated with a top-level claim.<n>We propose a method by which AI developers can prioritise, and thereby make their investigation of argument defeaters more efficient.
arXiv Detail & Related papers (2025-02-09T06:35:11Z) - Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress? [59.96471873997733]
We propose an empirical foundation for developing more meaningful safety metrics and define AI safety in a machine learning research context.<n>We aim to provide a more rigorous framework for AI safety research, advancing the science of safety evaluations and clarifying the path towards measurable progress.
arXiv Detail & Related papers (2024-07-31T17:59:24Z) - Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems [88.80306881112313]
We will introduce and define a family of approaches to AI safety, which we will refer to as guaranteed safe (GS) AI.
The core feature of these approaches is that they aim to produce AI systems which are equipped with high-assurance quantitative safety guarantees.
We outline a number of approaches for creating each of these three core components, describe the main technical challenges, and suggest a number of potential solutions to them.
arXiv Detail & Related papers (2024-05-10T17:38:32Z) - Scaling #DNN-Verification Tools with Efficient Bound Propagation and
Parallel Computing [57.49021927832259]
Deep Neural Networks (DNNs) are powerful tools that have shown extraordinary results in many scenarios.
However, their intricate designs and lack of transparency raise safety concerns when applied in real-world applications.
Formal Verification (FV) of DNNs has emerged as a valuable solution to provide provable guarantees on the safety aspect.
arXiv Detail & Related papers (2023-12-10T13:51:25Z) - ASSERT: Automated Safety Scenario Red Teaming for Evaluating the
Robustness of Large Language Models [65.79770974145983]
ASSERT, Automated Safety Scenario Red Teaming, consists of three methods -- semantically aligned augmentation, target bootstrapping, and adversarial knowledge injection.
We partition our prompts into four safety domains for a fine-grained analysis of how the domain affects model performance.
We find statistically significant performance differences of up to 11% in absolute classification accuracy among semantically related scenarios and error rates of up to 19% absolute error in zero-shot adversarial settings.
arXiv Detail & Related papers (2023-10-14T17:10:28Z) - Evaluating Model-free Reinforcement Learning toward Safety-critical
Tasks [70.76757529955577]
This paper revisits prior work in this scope from the perspective of state-wise safe RL.
We propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection.
To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit.
arXiv Detail & Related papers (2022-12-12T06:30:17Z) - Information-Theoretic Safe Exploration with Gaussian Processes [89.31922008981735]
We consider a sequential decision making task where we are not allowed to evaluate parameters that violate an unknown (safety) constraint.
Most current methods rely on a discretization of the domain and cannot be directly extended to the continuous case.
We propose an information-theoretic safe exploration criterion that directly exploits the GP posterior to identify the most informative safe parameters to evaluate.
arXiv Detail & Related papers (2022-12-09T15:23:58Z) - Meta-Learning Priors for Safe Bayesian Optimization [72.8349503901712]
We build on a meta-learning algorithm, F-PACOH, capable of providing reliable uncertainty quantification in settings of data scarcity.
As core contribution, we develop a novel framework for choosing safety-compliant priors in a data-riven manner.
On benchmark functions and a high-precision motion system, we demonstrate that our meta-learned priors accelerate the convergence of safe BO approaches.
arXiv Detail & Related papers (2022-10-03T08:38:38Z) - Recursively Feasible Probabilistic Safe Online Learning with Control Barrier Functions [60.26921219698514]
We introduce a model-uncertainty-aware reformulation of CBF-based safety-critical controllers.
We then present the pointwise feasibility conditions of the resulting safety controller.
We use these conditions to devise an event-triggered online data collection strategy.
arXiv Detail & Related papers (2022-08-23T05:02:09Z) - Log Barriers for Safe Black-box Optimization with Application to Safe
Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial.
Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size.
We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z) - Neural Bridge Sampling for Evaluating Safety-Critical Autonomous Systems [34.945482759378734]
We employ a probabilistic approach to safety evaluation in simulation, where we are concerned with computing the probability of dangerous events.
We develop a novel rare-event simulation method that combines exploration, exploitation, and optimization techniques to find failure modes and estimate their rate of occurrence.
arXiv Detail & Related papers (2020-08-24T17:46:27Z) - Efficient statistical validation with edge cases to evaluate Highly
Automated Vehicles [6.198523595657983]
The widescale deployment of Autonomous Vehicles seems to be imminent despite many safety challenges that are yet to be resolved.
Existing standards focus on deterministic processes where the validation requires only a set of test cases that cover the requirements.
This paper presents a new approach to compute the statistical characteristics of a system's behaviour by biasing automatically generated test cases towards the worst case scenarios.
arXiv Detail & Related papers (2020-03-04T04:35:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.