Hyperproperty-Constrained Secure Reinforcement Learning
- URL: http://arxiv.org/abs/2508.00106v1
- Date: Thu, 31 Jul 2025 18:57:18 GMT
- Title: Hyperproperty-Constrained Secure Reinforcement Learning
- Authors: Ernest Bonnah, Luan Viet Nguyen, Khaza Anuarul Hoque,
- Abstract summary: This paper focuses on HyperTWTL-constrained secure reinforcement learning (SecRL)<n>We propose an approach for learning security-aware optimal policies using dynamic Boltzmann softmax RL.
- Score: 0.16385815610837165
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Hyperproperties for Time Window Temporal Logic (HyperTWTL) is a domain-specific formal specification language known for its effectiveness in compactly representing security, opacity, and concurrency properties for robotics applications. This paper focuses on HyperTWTL-constrained secure reinforcement learning (SecRL). Although temporal logic-constrained safe reinforcement learning (SRL) is an evolving research problem with several existing literature, there is a significant research gap in exploring security-aware reinforcement learning (RL) using hyperproperties. Given the dynamics of an agent as a Markov Decision Process (MDP) and opacity/security constraints formalized as HyperTWTL, we propose an approach for learning security-aware optimal policies using dynamic Boltzmann softmax RL while satisfying the HyperTWTL constraints. The effectiveness and scalability of our proposed approach are demonstrated using a pick-up and delivery robotic mission case study. We also compare our results with two other baseline RL algorithms, showing that our proposed method outperforms them.
Related papers
- Dynamic Rank Reinforcement Learning for Adaptive Low-Rank Multi-Head Self Attention in Large Language Models [0.0]
We propose Dynamic Rank Reinforcement Learning (DR-RL), a novel framework that adaptively optimize the low-rank factorization of Multi-Head Self-Attention (MHSA) in Large Language Models (LLMs)<n>DR-RL maintains downstream accuracy statistically equivalent to full-rank attention while significantly reducing Floating Point Operations (FLOPs)<n>This work bridges the gap between adaptive efficiency and theoretical rigor in MHSA, offering a principled, mathematically grounded alternative to rank reduction techniques in resource-constrained deep learning.
arXiv Detail & Related papers (2025-12-17T21:09:19Z) - Control Synthesis of Cyber-Physical Systems for Real-Time Specifications through Causation-Guided Reinforcement Learning [3.608670495432032]
Signal temporal logic (STL) has emerged as a powerful formalism of expressing real-time constraints.<n> reinforcement learning (RL) has become an important method for solving control synthesis problems in unknown environments.<n>We propose an online reward generation method guided by the online causation monitoring of STL.
arXiv Detail & Related papers (2025-10-09T02:49:28Z) - Agentic Reinforced Policy Optimization [66.96989268893932]
Large-scale reinforcement learning with verifiable rewards (RLVR) has demonstrated its effectiveness in harnessing the potential of large language models (LLMs) for single-turn reasoning tasks.<n>Current RL algorithms inadequately balance the models' intrinsic long-horizon reasoning capabilities and their proficiency in multi-turn tool interactions.<n>We propose Agentic Reinforced Policy Optimization (ARPO), a novel agentic RL algorithm tailored for training multi-turn LLM-based agents.
arXiv Detail & Related papers (2025-07-26T07:53:11Z) - HypRL: Reinforcement Learning of Control Policies for Hyperproperties [0.3277163122167433]
We propose HYPRL, a specification-guided reinforcement learning framework.<n>We apply Skolemization to manage quantifier alternations and define quantitative functions to shape rewards.<n>A suitable RL algorithm is then used to learn policies that collectively maximize the expected reward.
arXiv Detail & Related papers (2025-04-07T01:58:36Z) - SoNIC: Safe Social Navigation with Adaptive Conformal Inference and Constrained Reinforcement Learning [26.554847852013737]
SoNIC is the first algorithm that integrates adaptive conformal inference and constrained reinforcement learning.<n>Our method achieves a success rate of 96.93%, which is 11.67% higher than the previous state-of-the-art RL method.<n>Our experiments demonstrate that the system can generate robust and socially polite decision-making when interacting with both sparse and dense crowds.
arXiv Detail & Related papers (2024-07-24T17:57:21Z) - Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - Constrained Decision Transformer for Offline Safe Reinforcement Learning [16.485325576173427]
We study the offline safe RL problem from a novel multi-objective optimization perspective.
We propose the constrained decision transformer (CDT) approach, which can dynamically adjust the trade-offs during deployment.
arXiv Detail & Related papers (2023-02-14T21:27:10Z) - Evaluating Model-free Reinforcement Learning toward Safety-critical
Tasks [70.76757529955577]
This paper revisits prior work in this scope from the perspective of state-wise safe RL.
We propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection.
To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit.
arXiv Detail & Related papers (2022-12-12T06:30:17Z) - Safe reinforcement learning for multi-energy management systems with
known constraint functions [0.0]
Reinforcement learning (RL) is a promising optimal control technique for multi-energy management systems.
We present two novel safe RL methods, namely SafeFallback and GiveSafe.
In a simulated multi-energy systems case study we have shown that both methods start with a significantly higher utility.
arXiv Detail & Related papers (2022-07-08T11:33:53Z) - Constrained Reinforcement Learning for Robotics via Scenario-Based
Programming [64.07167316957533]
It is crucial to optimize the performance of DRL-based agents while providing guarantees about their behavior.
This paper presents a novel technique for incorporating domain-expert knowledge into a constrained DRL training loop.
Our experiments demonstrate that using our approach to leverage expert knowledge dramatically improves the safety and the performance of the agent.
arXiv Detail & Related papers (2022-06-20T07:19:38Z) - Safe-Critical Modular Deep Reinforcement Learning with Temporal Logic
through Gaussian Processes and Control Barrier Functions [3.5897534810405403]
Reinforcement learning (RL) is a promising approach and has limited success towards real-world applications.
In this paper, we propose a learning-based control framework consisting of several aspects.
We show such an ECBF-based modular deep RL algorithm achieves near-perfect success rates and guard safety with a high probability.
arXiv Detail & Related papers (2021-09-07T00:51:12Z) - Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot
Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO)
We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z) - Certified Reinforcement Learning with Logic Guidance [78.2286146954051]
We propose a model-free RL algorithm that enables the use of Linear Temporal Logic (LTL) to formulate a goal for unknown continuous-state/action Markov Decision Processes (MDPs)
The algorithm is guaranteed to synthesise a control policy whose traces satisfy the specification with maximal probability.
arXiv Detail & Related papers (2019-02-02T20:09:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.