OntoGSN: An Ontology for Dynamic Management of Assurance Cases
- URL: http://arxiv.org/abs/2506.11023v1
- Date: Tue, 20 May 2025 08:15:16 GMT
- Title: OntoGSN: An Ontology for Dynamic Management of Assurance Cases
- Authors: Tomas Bueno Momcilovic, Barbara Gallina, Ingmar Kessler, Dian Balta,
- Abstract summary: We present OntoGSN: an ontology and supporting OWL for managing ACs in the Goalcturing Notation (GSN) standard.<n>OntoGSN offers a knowledge representation and a queryable graph that can be automatically, evaluated, and updated.<n>We demonstrate the utility of our contributions in an example involving assurance of robustness in large language models.
- Score: 0.3999851878220878
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Assurance cases (ACs) are a common artifact for building and maintaining confidence in system properties such as safety or robustness. Constructing an AC can be challenging, although existing tools provide support in static, document-centric applications and methods for dynamic contexts (e.g., autonomous driving) are emerging. Unfortunately, managing ACs remains a challenge, since maintaining the embedded knowledge in the face of changes requires substantial effort, in the process deterring developers - or worse, producing poorly managed cases that instill false confidence. To address this, we present OntoGSN: an ontology and supporting middleware for managing ACs in the Goal Structuring Notation (GSN) standard. OntoGSN offers a knowledge representation and a queryable graph that can be automatically populated, evaluated, and updated. Our contributions include: a 1:1 formalization of the GSN Community Standard v3 in an OWL ontology with SWRL rules; a helper ontology and parser for integration with a widely used AC tool; a repository and documentation of design decisions for OntoGSN maintenance; a SPARQL query library with automation patterns; and a prototypical interface. The ontology strictly adheres to the standard's text and has been evaluated according to FAIR principles, the OOPS framework, competency questions, and community feedback. The development of other middleware elements is guided by the community needs and subject to ongoing evaluations. To demonstrate the utility of our contributions, we illustrate dynamic AC management in an example involving assurance of adversarial robustness in large language models.
Related papers
- Why Does the LLM Stop Computing: An Empirical Study of User-Reported Failures in Open-Source LLMs [50.075587392477935]
We conduct the first large-scale empirical study of 705 real-world failures from the open-source DeepSeek, Llama, and Qwen ecosystems.<n>Our analysis reveals a paradigm shift: white-box orchestration relocates the reliability bottleneck from model algorithmic defects to the systemic fragility of the deployment stack.
arXiv Detail & Related papers (2026-01-20T06:42:56Z) - From Completion to Editing: Unlocking Context-Aware Code Infilling via Search-and-Replace Instruction Tuning [81.97788535387286]
We propose a framework that internalizes the agentic verification-and-editing mechanism into a unified, single-pass inference process.<n>With minimal data, SRI-Coder enables Chat models to surpass the completion performance of their Base counterparts.<n>Unlike FIM-style tuning, SRI preserves general coding competencies and maintains inference latency comparable to standard FIM.
arXiv Detail & Related papers (2026-01-19T20:33:53Z) - ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development [72.4729759618632]
We introduce ABC-Bench, a benchmark to evaluate agentic backend coding within a realistic, executable workflow.<n>We curated 224 practical tasks spanning 8 languages and 19 frameworks from open-source repositories.<n>Our evaluation reveals that even state-of-the-art models struggle to deliver reliable performance on these holistic tasks.
arXiv Detail & Related papers (2026-01-16T08:23:52Z) - AJAR: Adaptive Jailbreak Architecture for Red-teaming [1.356919241968803]
AJAR is a proof-of-concept framework designed to bridge the gap between "red-teaming" and "action security"<n>AJAR decouples adversarial logic from the execution loop, encapsulating state-of-the-art algorithms like X-Teaming as standardized, plug-and-play services.<n> AJAR is open-sourced to facilitate the standardized, environment-aware evaluation of this emerging attack surface.
arXiv Detail & Related papers (2026-01-16T03:30:40Z) - Monadic Context Engineering [59.95390010097654]
This paper introduces Monadic Context Engineering (MCE) to provide a formal foundation for agent design.<n>We demonstrate how Monads enable robust composition, how Applicatives provide a principled structure for parallel execution, and crucially, how Monad Transformers allow for the systematic composition of these capabilities.<n>This layered approach enables developers to construct complex, resilient, and efficient AI agents from simple, independently verifiable components.
arXiv Detail & Related papers (2025-12-27T01:52:06Z) - Autonomous Issue Resolver: Towards Zero-Touch Code Maintenance [0.0]
We propose a paradigm shift from the standard Code Property Graphs to the concept of Data Transformation Graph (DTG)<n>We introduce a multi-agent framework that reconciles data integrity navigation with control flow logic.<n>Our approach has demonstrated good results on several SWE benchmarks, reaching a resolution rate of 87.1%.
arXiv Detail & Related papers (2025-12-09T11:11:37Z) - Everything is Context: Agentic File System Abstraction for Context Engineering [11.63011212134865]
This paper proposes a file-system abstraction for context engineering.<n>The abstraction offers a persistent, governed infrastructure for managing heterogeneous context artefacts.<n>As GenAI becomes an active collaborator in decision support, humans play a central role as curators, verifiers, and co-reasoners.
arXiv Detail & Related papers (2025-12-05T06:56:45Z) - BarrierBench : Evaluating Large Language Models for Safety Verification in Dynamical Systems [4.530582224312311]
We introduce an LLM-based agentic framework for barrier certificate synthesis.<n>The framework uses natural language reasoning to propose, refine, and validate candidate certificates.<n> BarrierBench is a benchmark of 100 dynamical systems spanning linear, nonlinear, discrete-time, and continuous-time settings.
arXiv Detail & Related papers (2025-11-12T14:23:49Z) - Rethinking Testing for LLM Applications: Characteristics, Challenges, and a Lightweight Interaction Protocol [83.83217247686402]
Large Language Models (LLMs) have evolved from simple text generators into complex software systems that integrate retrieval augmentation, tool invocation, and multi-turn interactions.<n>Their inherent non-determinism, dynamism, and context dependence pose fundamental challenges for quality assurance.<n>This paper decomposes LLM applications into a three-layer architecture: textbftextitSystem Shell Layer, textbftextitPrompt Orchestration Layer, and textbftextitLLM Inference Core.
arXiv Detail & Related papers (2025-08-28T13:00:28Z) - Osprey: A Scalable Framework for the Orchestration of Agentic Systems [0.4970364068620607]
Osprey Framework is a production-ready architecture for scalable agentic systems that integrate conversational context with robust tool orchestration across safety-critical domains.<n>Our framework provides: (i) dynamic capability classification to select only relevant tools; (ii) plan-first orchestration with explicit dependencies and optional human approval; and (iii) context-aware task extraction that combines dialogue history with external memory and domain resources.
arXiv Detail & Related papers (2025-08-20T20:57:13Z) - Generative AI-Empowered Secure Communications in Space-Air-Ground Integrated Networks: A Survey and Tutorial [107.26005706569498]
Space-air-ground integrated networks (SAGINs) face unprecedented security challenges due to their inherent characteristics.<n>Generative AI (GAI) is a transformative approach that can safeguard SAGIN security by synthesizing data, understanding semantics, and making autonomous decisions.
arXiv Detail & Related papers (2025-08-04T01:42:57Z) - Think Like an Engineer: A Neuro-Symbolic Collaboration Agent for Generative Software Requirements Elicitation and Self-Review [23.26988707110507]
This paper introduces RequireCEG, a requirement elicitation and self-review agent that embeds causal-effect graphs (CEGs) in a neuro-symbolic collaboration architecture.<n>To evaluate our method, we created the RGPair benchmark dataset and conducted extensive experiments.
arXiv Detail & Related papers (2025-07-20T13:59:00Z) - On Automating Security Policies with Contemporary LLMs [3.47402794691087]
In this paper, we present a framework for automating attack mitigation policy compliance through an innovative combination of in-context learning and retrieval-augmented generation (RAG)<n>Our empirical evaluation, conducted using publicly available CTI policies in STIXv2 format and Windows API documentation, demonstrates significant improvements in precision, recall, and F1-score when employing RAG compared to a non-RAG baseline.
arXiv Detail & Related papers (2025-06-05T09:58:00Z) - A Novel Zero-Trust Identity Framework for Agentic AI: Decentralized Authentication and Fine-Grained Access Control [7.228060525494563]
This paper posits the imperative for a novel Agentic AI IAM framework.<n>We propose a comprehensive framework built upon rich, verifiable Agent Identities (IDs)<n>We also explore how Zero-Knowledge Proofs (ZKPs) enable privacy-preserving attribute disclosure and verifiable policy compliance.
arXiv Detail & Related papers (2025-05-25T20:21:55Z) - Advancing Neural Network Verification through Hierarchical Safety Abstract Interpretation [52.626086874715284]
We introduce a novel problem formulation called Abstract DNN-Verification, which verifies a hierarchical structure of unsafe outputs.<n>By leveraging abstract interpretation and reasoning about output reachable sets, our approach enables assessing multiple safety levels during the formal verification process.<n>Our contributions include a theoretical exploration of the relationship between our novel abstract safety formulation and existing approaches.
arXiv Detail & Related papers (2025-05-08T13:29:46Z) - Towards Trustworthy GUI Agents: A Survey [64.6445117343499]
This survey examines the trustworthiness of GUI agents in five critical dimensions.<n>We identify major challenges such as vulnerability to adversarial attacks, cascading failure modes in sequential decision-making.<n>As GUI agents become more widespread, establishing robust safety standards and responsible development practices is essential.
arXiv Detail & Related papers (2025-03-30T13:26:00Z) - A Survey on (M)LLM-Based GUI Agents [62.57899977018417]
Graphical User Interface (GUI) Agents have emerged as a transformative paradigm in human-computer interaction.<n>Recent advances in large language models and multimodal learning have revolutionized GUI automation across desktop, mobile, and web platforms.<n>This survey identifies key technical challenges, including accurate element localization, effective knowledge retrieval, long-horizon planning, and safety-aware execution control.
arXiv Detail & Related papers (2025-03-27T17:58:31Z) - A Framework for Measuring the Quality of Infrastructure-as-Code Scripts [0.0]
Infrastructure as Code (IaC) has become integral to modern software development.<n>Rapid proliferation of IaC scripts has highlighted the need for better code quality assessment methods.<n>This paper proposes a new IaC code quality framework specifically showcased for repositories as a foundation.
arXiv Detail & Related papers (2025-02-05T12:36:19Z) - Operationalizing Assurance Cases for Data Scientists: A Showcase of
Concepts and Tooling in the Context of Test Data Quality for Machine Learning [1.6403311770639912]
Assurance Cases (ACs) are an established approach in safety engineering to argue quality claims in a structured way.
We propose a framework to support the operationalization of ACs for Machine Learning (ML) components based on technologies that data scientists use on a daily basis: Python and Jupyter Notebook.
Results from the application of the framework, documented through notebooks, can be integrated into existing AC tools.
arXiv Detail & Related papers (2023-12-08T09:34:46Z) - A General Framework for Verification and Control of Dynamical Models via Certificate Synthesis [54.959571890098786]
We provide a framework to encode system specifications and define corresponding certificates.
We present an automated approach to formally synthesise controllers and certificates.
Our approach contributes to the broad field of safe learning for control, exploiting the flexibility of neural networks.
arXiv Detail & Related papers (2023-09-12T09:37:26Z) - Towards an Interface Description Template for AI-enabled Systems [77.34726150561087]
Reuse is a common system architecture approach that seeks to instantiate a system architecture with existing components.
There is currently no framework that guides the selection of necessary information to assess their portability to operate in a system different than the one for which the component was originally purposed.
We present ongoing work on establishing an interface description template that captures the main information of an AI-enabled component.
arXiv Detail & Related papers (2020-07-13T20:30:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.