BlueGlass: A Framework for Composite AI Safety
- URL: http://arxiv.org/abs/2507.10106v1
- Date: Mon, 14 Jul 2025 09:45:34 GMT
- Title: BlueGlass: A Framework for Composite AI Safety
- Authors: Harshal Nandigramwar, Syed Qutub, Kay-Ulrich Scholl,
- Abstract summary: This paper introduces BlueGlass, a framework designed to facilitate AI safety by providing a unified infrastructure.<n>To demonstrate the utility of this framework, we present three safety-oriented analyses on vision-language evaluation.<n>More broadly, this work contributes infrastructure and findings for building more robust and reliable AI systems.
- Score: 0.2999888908665658
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As AI systems become increasingly capable and ubiquitous, ensuring the safety of these systems is critical. However, existing safety tools often target different aspects of model safety and cannot provide full assurance in isolation, highlighting a need for integrated and composite methodologies. This paper introduces BlueGlass, a framework designed to facilitate composite AI safety workflows by providing a unified infrastructure enabling the integration and composition of diverse safety tools that operate across model internals and outputs. Furthermore, to demonstrate the utility of this framework, we present three safety-oriented analyses on vision-language models for the task of object detection: (1) distributional evaluation, revealing performance trade-offs and potential failure modes across distributions; (2) probe-based analysis of layer dynamics highlighting shared hierarchical learning via phase transition; and (3) sparse autoencoders identifying interpretable concepts. More broadly, this work contributes foundational infrastructure and findings for building more robust and reliable AI systems.
Related papers
- Towards provable probabilistic safety for scalable embodied AI systems [79.31011047593492]
Embodied AI systems are increasingly prevalent across various applications.<n> Ensuring their safety in complex operating environments remains a major challenge.<n>We introduce provable probabilistic safety, which aims to ensure that the residual risk of large-scale deployment remains below a predefined threshold.
arXiv Detail & Related papers (2025-06-05T15:46:25Z) - UniSTPA: A Safety Analysis Framework for End-to-End Autonomous Driving [10.063740202765343]
We propose the Unified System Theoretic Process Analysis (UniSTPA) framework.<n>UniSTPA performs hazard analysis not only at the component level but also within the model's internal layers.<n>The proposed framework thus offers both theoretical and practical guidance for the safe development and deployment of end-to-end autonomous driving systems.
arXiv Detail & Related papers (2025-05-21T01:23:31Z) - AISafetyLab: A Comprehensive Framework for AI Safety Evaluation and Improvement [73.0700818105842]
We introduce AISafetyLab, a unified framework and toolkit that integrates representative attack, defense, and evaluation methodologies for AI safety.<n> AISafetyLab features an intuitive interface that enables developers to seamlessly apply various techniques.<n>We conduct empirical studies on Vicuna, analyzing different attack and defense strategies to provide valuable insights into their comparative effectiveness.
arXiv Detail & Related papers (2025-02-24T02:11:52Z) - In-Context Experience Replay Facilitates Safety Red-Teaming of Text-to-Image Diffusion Models [104.94706600050557]
Text-to-image (T2I) models have shown remarkable progress, but their potential to generate harmful content remains a critical concern in the ML community.<n>We propose ICER, a novel red-teaming framework that generates interpretable and semantic meaningful problematic prompts.<n>Our work provides crucial insights for developing more robust safety mechanisms in T2I systems.
arXiv Detail & Related papers (2024-11-25T04:17:24Z) - EARBench: Towards Evaluating Physical Risk Awareness for Task Planning of Foundation Model-based Embodied AI Agents [53.717918131568936]
Embodied artificial intelligence (EAI) integrates advanced AI models into physical entities for real-world interaction.<n>Foundation models as the "brain" of EAI agents for high-level task planning have shown promising results.<n>However, the deployment of these agents in physical environments presents significant safety challenges.<n>This study introduces EARBench, a novel framework for automated physical risk assessment in EAI scenarios.
arXiv Detail & Related papers (2024-08-08T13:19:37Z) - Co-designing heterogeneous models: a distributed systems approach [0.40964539027092917]
This paper presents a modelling approach tailored for heterogeneous systems based on three elements.
An inferentialist interpretation of what a model is, a distributed systems metaphor and a co-design cycle describe the practical design and construction of the model.
We explore the suitability of this method in the context of three different security-oriented models.
arXiv Detail & Related papers (2024-07-10T13:35:38Z) - Leveraging Traceability to Integrate Safety Analysis Artifacts into the
Software Development Process [51.42800587382228]
Safety assurance cases (SACs) can be challenging to maintain during system evolution.
We propose a solution that leverages software traceability to connect relevant system artifacts to safety analysis models.
We elicit design rationales for system changes to help safety stakeholders analyze the impact of system changes on safety.
arXiv Detail & Related papers (2023-07-14T16:03:27Z) - A Model Based Framework for Testing Safety and Security in Operational
Technology Environments [0.46040036610482665]
We propose a model-based testing approach which we consider a promising way to analyze the safety and security behavior of a system under test.
The structure of the underlying framework is divided into four parts, according to the critical factors in testing of operational technology environments.
arXiv Detail & Related papers (2023-06-22T05:37:09Z) - Evaluating Model-free Reinforcement Learning toward Safety-critical
Tasks [70.76757529955577]
This paper revisits prior work in this scope from the perspective of state-wise safe RL.
We propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection.
To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit.
arXiv Detail & Related papers (2022-12-12T06:30:17Z) - Synergistic Redundancy: Towards Verifiable Safety for Autonomous
Vehicles [10.277825331268179]
We propose Synergistic Redundancy (SR) a safety architecture for complex cyber physical systems, like Autonomous Vehicle (AV)
SR provides verifiable safety guarantees against specific faults by decoupling the mission and safety tasks of the system.
Close coordination with the mission layer allows easier and early detection of safety critical faults in the system.
arXiv Detail & Related papers (2022-09-04T23:52:03Z) - An Empirical Analysis of the Use of Real-Time Reachability for the
Safety Assurance of Autonomous Vehicles [7.1169864450668845]
We propose using a real-time reachability algorithm for the implementation of the simplex architecture to assure the safety of a 1/10 scale open source autonomous vehicle platform.
In our approach, the need to analyze an underlying controller is abstracted away, instead focusing on the effects of the controller's decisions on the system's future states.
arXiv Detail & Related papers (2022-05-03T11:12:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.