Shield Synthesis for LTL Modulo Theories
- URL: http://arxiv.org/abs/2406.04184v1
- Date: Thu, 6 Jun 2024 15:40:29 GMT
- Title: Shield Synthesis for LTL Modulo Theories
- Authors: Andoni Rodriguez, Guy Amir, Davide Corsi, Cesar Sanchez, Guy Katz,
- Abstract summary: We develop a novel approach for generating shields conforming to complex safety specifications.
To the best of our knowledge, this is the first approach for synthesizing shields for such expressivity.
- Score: 2.034732821736745
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, Machine Learning (ML) models have achieved remarkable success in various domains. However, these models also tend to demonstrate unsafe behaviors, precluding their deployment in safety-critical systems. To cope with this issue, ample research focuses on developing methods that guarantee the safe behaviour of a given ML model. A prominent example is shielding which incorporates an external component (a "shield") that blocks unwanted behavior. Despite significant progress, shielding suffers from a main setback: it is currently geared towards properties encoded solely in propositional logics (e.g., LTL) and is unsuitable for richer logics. This, in turn, limits the widespread applicability of shielding in many real-world systems. In this work, we address this gap, and extend shielding to LTL modulo theories, by building upon recent advances in reactive synthesis modulo theories. This allowed us to develop a novel approach for generating shields conforming to complex safety specifications in these more expressive, logics. We evaluated our shields and demonstrate their ability to handle rich data with temporal dynamics. To the best of our knowledge, this is the first approach for synthesizing shields for such expressivity.
Related papers
- Synthesizing Efficient and Permissive Programmatic Runtime Shields for Neural Policies [7.831197018945118]
We propose a novel framework that synthesizes lightweight and permissive programmatic runtime shields for neural policies.
Aegis achieves this by formulating the seeking of a runtime shield as a sketch-based program synthesis problem.
Compared to the current state-of-the-art, Aegis's shields exhibit a 2.1$times$ reduction in time overhead and a 4.4$times$ reduction in memory usage.
arXiv Detail & Related papers (2024-10-08T02:44:55Z) - What Makes and Breaks Safety Fine-tuning? A Mechanistic Study [64.9691741899956]
Safety fine-tuning helps align Large Language Models (LLMs) with human preferences for their safe deployment.
We design a synthetic data generation framework that captures salient aspects of an unsafe input.
Using this, we investigate three well-known safety fine-tuning methods.
arXiv Detail & Related papers (2024-07-14T16:12:57Z) - RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content [62.685566387625975]
Current mitigation strategies, while effective, are not resilient under adversarial attacks.
This paper introduces Resilient Guardrails for Large Language Models (RigorLLM), a novel framework designed to efficiently moderate harmful and unsafe inputs.
arXiv Detail & Related papers (2024-03-19T07:25:02Z) - InferAligner: Inference-Time Alignment for Harmlessness through
Cross-Model Guidance [56.184255657175335]
We develop textbfInferAligner, a novel inference-time alignment method that utilizes cross-model guidance for harmlessness alignment.
Experimental results show that our method can be very effectively applied to domain-specific models in finance, medicine, and mathematics.
It significantly diminishes the Attack Success Rate (ASR) of both harmful instructions and jailbreak attacks, while maintaining almost unchanged performance in downstream tasks.
arXiv Detail & Related papers (2024-01-20T10:41:03Z) - Approximate Model-Based Shielding for Safe Reinforcement Learning [83.55437924143615]
We propose a principled look-ahead shielding algorithm for verifying the performance of learned RL policies.
Our algorithm differs from other shielding approaches in that it does not require prior knowledge of the safety-relevant dynamics of the system.
We demonstrate superior performance to other safety-aware approaches on a set of Atari games with state-dependent safety-labels.
arXiv Detail & Related papers (2023-07-27T15:19:45Z) - Safety Shielding under Delayed Observation [59.86192283565134]
Shields are correct-by-construction enforcers that guarantee safe execution.
Shields should pick the safe corrective actions in such a way that future interferences are most likely minimized.
We present the first integration of shields in a realistic driving simulator.
arXiv Detail & Related papers (2023-07-05T10:06:10Z) - Approximate Shielding of Atari Agents for Safe Exploration [83.55437924143615]
We propose a principled algorithm for safe exploration based on the concept of shielding.
We present preliminary results that show our approximate shielding algorithm effectively reduces the rate of safety violations.
arXiv Detail & Related papers (2023-04-21T16:19:54Z) - Fundamental Limitations of Alignment in Large Language Models [16.393916864600193]
An important aspect in developing language models that interact with humans is aligning their behavior to be useful and unharmful.
This is usually achieved by tuning the model in a way that enhances desired behaviors and inhibits undesired ones.
We propose a theoretical approach called Behavior Expectation Bounds (BEB) which allows us to formally investigate several inherent characteristics and limitations of alignment in large language models.
arXiv Detail & Related papers (2023-04-19T17:50:09Z) - Model-based Dynamic Shielding for Safe and Efficient Multi-Agent
Reinforcement Learning [7.103977648997475]
Multi-Agent Reinforcement Learning (MARL) discovers policies that maximize reward but do not have safety guarantees during the learning and deployment phases.
Model-based Dynamic Shielding (MBDS) to support MARL algorithm design.
arXiv Detail & Related papers (2023-04-13T06:08:10Z) - Safe MDP Planning by Learning Temporal Patterns of Undesirable
Trajectories and Averting Negative Side Effects [27.41101006357176]
In safe MDP planning, a cost function based on the current state and action is often used to specify safety aspects.
operating based on an incomplete model can often produce unintended negative side effects (NSEs)
arXiv Detail & Related papers (2023-04-06T14:03:24Z) - Dynamic Shielding for Reinforcement Learning in Black-Box Environments [2.696933675395521]
It is challenging to use reinforcement learning in cyber-physical systems due to the lack of safety guarantees during learning.
This paper aims to reduce undesired behaviors during learning without requiring any prior system knowledge.
We propose dynamic shielding: an extension of a model-based safe RL technique called shielding using automata learning.
arXiv Detail & Related papers (2022-07-27T10:54:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.