Related papers: Formalizing Operational Design Domains with the Pkl Language

Formalizing Operational Design Domains with the Pkl Language

URL: http://arxiv.org/abs/2509.02221v1
Date: Tue, 02 Sep 2025 11:41:27 GMT
Title: Formalizing Operational Design Domains with the Pkl Language
Authors: Martin Skoglund, Fredrik Warg, Anders Thorsén, Sasikumar Punnekkat, Hans Hansson,
Abstract summary: The deployment of automated functions that can operate without direct human supervision has changed safety evaluation in domains seeking higher levels of automation.<n>To make a convincing safety claim, the developer must present a thorough justification argument, supported by evidence, that a function is free from unreasonable risk when operated in its intended context.<n>This paper presents a way to formalize an Operational Design Domain specification (ODD) in the Pkl language.
Score: 0.4349640169711269
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The deployment of automated functions that can operate without direct human supervision has changed safety evaluation in domains seeking higher levels of automation. Unlike conventional systems that rely on human operators, these functions require new assessment frameworks to demonstrate that they do not introduce unacceptable risks under real-world conditions. To make a convincing safety claim, the developer must present a thorough justification argument, supported by evidence, that a function is free from unreasonable risk when operated in its intended context. The key concept relevant to the presented work is the intended context, often captured by an Operational Design Domain specification (ODD). ODD formalization is challenging due to the need to maintain flexibility in adopting diverse specification formats while preserving consistency and traceability and integrating seamlessly into the development, validation, and assessment. This paper presents a way to formalize an ODD in the Pkl language, addressing central challenges in specifying ODDs while improving usability through specialized configuration language features. The approach is illustrated with an automotive example but can be broadly applied to ensure rigorous assessments of operational contexts.

Related papers

Agentic Problem Frames: A Systematic Approach to Engineering Reliable Domain Agents [0.0]
Large Language Models (LLMs) are evolving into autonomous agents, yet current "frameless" development--relying on ambiguous natural language--leads to critical risks such as scope creep and open-loop failures.<n>This study proposes Agentic Problem Frames (APF), a systematic engineering framework that shifts focus from internal model intelligence to the structured interaction between the agent and its environment.
arXiv Detail & Related papers (2026-02-22T06:32:32Z)
Steering LLMs via Scalable Interactive Oversight [74.12746881843044]
Large Language Models increasingly automate complex, long-horizon tasks such as emphvibe coding, a supervision gap has emerged.<n>It presents a critical challenge in scalable oversight: enabling humans to responsibly steer AI systems on tasks that surpass their own ability to specify or verify.
arXiv Detail & Related papers (2026-02-04T04:52:00Z)
The Why Behind the Action: Unveiling Internal Drivers via Agentic Attribution [63.61358761489141]
Large Language Model (LLM)-based agents are widely used in real-world applications such as customer service, web navigation, and software engineering.<n>We propose a novel framework for textbfgeneral agentic attribution, designed to identify the internal factors driving agent actions regardless of the task outcome.<n>We validate our framework across a diverse suite of agentic scenarios, including standard tool use and subtle reliability risks like memory-induced bias.
arXiv Detail & Related papers (2026-01-21T15:22:21Z)
Automated Formalization of Probabilistic Requirements from Structured Natural Language [2.8065951726067726]
We extend NASA's Formal Requirement Elicitation Tool (FRET) with support for the specification of unambiguous and correct probabilistic requirements.<n>We propose and develop a formal, compositional, and automated approach for translating structured natural-language requirements into formulas in probabilistic temporal logic.
arXiv Detail & Related papers (2025-12-15T20:20:27Z)
ATA: A Neuro-Symbolic Approach to Implement Autonomous and Trustworthy Agents [0.9740025522928777]
Large Language Models (LLMs) have demonstrated impressive capabilities, yet their deployment in high-stakes domains is hindered by inherent limitations in trustworthiness.<n>We introduce a generic neuro-symbolic approach, which we call Autonomous Trustworthy Agents (ATA)
arXiv Detail & Related papers (2025-10-18T07:35:54Z)
SOPBench: Evaluating Language Agents at Following Standard Operating Procedures and Constraints [59.645885492637845]
SOPBench is an evaluation pipeline that transforms each service-specific SOP code program into a directed graph of executable functions.<n>Our approach transforms each service-specific SOP code program into a directed graph of executable functions and requires agents to call these functions based on natural language SOP descriptions.<n>We evaluate 18 leading models, and results show the task is challenging even for top-tier models.
arXiv Detail & Related papers (2025-03-11T17:53:02Z)
Semantic Integrity Constraints: Declarative Guardrails for AI-Augmented Data Processing Systems [39.23499993745249]
We introduce semantic integrity constraints (SICs) for specifying and enforcing correctness conditions over LLM outputs in semantic queries.<n>SICs generalize traditional database integrity constraints to semantic settings, supporting common types of constraints, such as grounding, soundness, and exclusion.<n>We present a system design for integrating SICs into query planning and runtime and discuss its realization in AI-augmented DPSs.
arXiv Detail & Related papers (2025-03-01T19:59:25Z)
Interactive Agents to Overcome Ambiguity in Software Engineering [61.40183840499932]
AI agents are increasingly being deployed to automate tasks, often based on ambiguous and underspecified user instructions.<n>Making unwarranted assumptions and failing to ask clarifying questions can lead to suboptimal outcomes.<n>We study the ability of LLM agents to handle ambiguous instructions in interactive code generation settings by evaluating proprietary and open-weight models on their performance.
arXiv Detail & Related papers (2025-02-18T17:12:26Z)
Beyond One-Time Validation: A Framework for Adaptive Validation of Prognostic and Diagnostic AI-based Medical Devices [55.319842359034546]
Existing approaches often fall short in addressing the complexity of practically deploying these devices. The presented framework emphasizes the importance of repeating validation and fine-tuning during deployment. It is positioned within the current US and EU regulatory landscapes.
arXiv Detail & Related papers (2024-09-07T11:13:52Z)
Towards Scenario-based Safety Validation for Autonomous Trains with Deep Generative Models [0.0]
We report our practical experiences regarding the utility of data simulation with deep generative models for scenario-based validation. We demonstrate the capabilities of semantically editing railway scenes with deep generative models to make a limited amount of test data more representative.
arXiv Detail & Related papers (2023-10-16T17:55:14Z)
Safety of the Intended Functionality Concept Integration into a Validation Tool Suite [0.0]
This work demonstrates how the integration of the SOTIF process within an existing validation tool suite can be achieved. The necessary adaptations are explained with accompanying examples to aid comprehension of the approach.
arXiv Detail & Related papers (2023-08-31T12:22:35Z)
Setting AI in context: A case study on defining the context and operational design domain for automated driving [5.083561746476347]
The case study investigates the challenges with context definitions for the development of perception functions that use machine learning for automated driving. The results outline challenges experienced by an automotive supplier company when defining the operational context for systems using machine learning.
arXiv Detail & Related papers (2022-01-27T11:26:32Z)
Unsupervised Domain Generalization for Person Re-identification: A Domain-specific Adaptive Framework [50.88463458896428]
Domain generalization (DG) has attracted much attention in person re-identification (ReID) recently. Existing methods usually need the source domains to be labeled, which could be a significant burden for practical ReID tasks. We propose a simple and efficient domain-specific adaptive framework, and realize it with an adaptive normalization module.
arXiv Detail & Related papers (2021-11-30T02:35:51Z)
Evaluating the Safety of Deep Reinforcement Learning Models using Semi-Formal Verification [81.32981236437395]
We present a semi-formal verification approach for decision-making tasks based on interval analysis. Our method obtains comparable results over standard benchmarks with respect to formal verifiers. Our approach allows to efficiently evaluate safety properties for decision-making models in practical applications.
arXiv Detail & Related papers (2020-10-19T11:18:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.