Policy Optimization with Linear Temporal Logic Constraints
- URL: http://arxiv.org/abs/2206.09546v1
- Date: Mon, 20 Jun 2022 02:58:02 GMT
- Title: Policy Optimization with Linear Temporal Logic Constraints
- Authors: Cameron Voloshin, Hoang M. Le, Swarat Chaudhuri, Yisong Yue
- Abstract summary: We study the problem of policy optimization with linear temporal logic constraints.
We develop a model-based approach that enjoys a sample complexity analysis for guaranteeing both task satisfaction and cost optimality.
- Score: 37.27882290236194
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the problem of policy optimization (PO) with linear temporal logic
(LTL) constraints. The language of LTL allows flexible description of tasks
that may be unnatural to encode as a scalar cost function. We consider
LTL-constrained PO as a systematic framework, decoupling task specification
from policy selection, and an alternative to the standard of cost shaping. With
access to a generative model, we develop a model-based approach that enjoys a
sample complexity analysis for guaranteeing both task satisfaction and cost
optimality (through a reduction to a reachability problem). Empirically, our
algorithm can achieve strong performance even in low sample regimes.
Related papers
- Conformal Constrained Policy Optimization for Cost-Effective LLM Agents [27.37909142846675]
Large language models (LLMs) have recently made tremendous progress towards solving challenging AI problems.<n>We propose a novel strategy where we combine multiple LLM models with varying cost/accuracy tradeoffs in an agentic manner.<n>Our approach provides a principled and practical framework for deploying LLM agents that are significantly more cost-effective while maintaining reliability.
arXiv Detail & Related papers (2025-11-14T19:39:28Z) - Constrained and Robust Policy Synthesis with Satisfiability-Modulo-Probabilistic-Model-Checking [4.064849471241967]
This paper contributes the first approach to effectively compute robust policies subject to arbitrary structural constraints.<n> Experiments on a few hundred benchmarks demonstrate the feasibility for constrained and robust policy synthesis.
arXiv Detail & Related papers (2025-11-11T10:28:42Z) - Adaptive Neighborhood-Constrained Q Learning for Offline Reinforcement Learning [52.03884701766989]
offline reinforcement learning (RL) algorithms typically impose constraints on action selection.<n>We propose a new neighborhood constraint that restricts action selection in the Bellman target to the union of neighborhoods of dataset actions.<n>We develop a simple yet effective algorithm, Adaptive Neighborhood-constrained Q learning (ANQ), to perform Q learning with target actions satisfying this constraint.
arXiv Detail & Related papers (2025-11-04T13:42:05Z) - Stabilizing Policy Gradients for Sample-Efficient Reinforcement Learning in LLM Reasoning [77.92320830700797]
Reinforcement Learning has played a central role in enabling reasoning capabilities of Large Language Models.<n>We propose a tractable computational framework that tracks and leverages curvature information during policy updates.<n>The algorithm, Curvature-Aware Policy Optimization (CAPO), identifies samples that contribute to unstable updates and masks them out.
arXiv Detail & Related papers (2025-10-01T12:29:32Z) - Projection-free Algorithms for Online Convex Optimization with Adversarial Constraints [10.047668792033033]
We study a generalization of the Online Convex Optimization (OCO) framework with time-varying adversarial constraints.
In this problem, after selecting a feasible action from the convex decision set $X,$ a convex constraint function is revealed alongside the cost function in each round.
We propose a *projection-free* online policy which makes a single call to a Linear Program (LP) solver per round.
arXiv Detail & Related papers (2025-01-28T13:04:32Z) - Attribute Controlled Fine-tuning for Large Language Models: A Case Study on Detoxification [76.14641982122696]
We propose a constraint learning schema for fine-tuning Large Language Models (LLMs) with attribute control.
We show that our approach leads to an LLM that produces fewer inappropriate responses while achieving competitive performance on benchmarks and a toxicity detection task.
arXiv Detail & Related papers (2024-10-07T23:38:58Z) - DeepLTL: Learning to Efficiently Satisfy Complex LTL Specifications [59.01527054553122]
Linear temporal logic (LTL) has recently been adopted as a powerful formalism for specifying complex, temporally extended tasks in reinforcement learning (RL)
Existing approaches suffer from several shortcomings: they are often only applicable to finite-horizon fragments, are restricted to suboptimal solutions, and do not adequately handle safety constraints.
In this work, we propose a novel learning approach to address these concerns.
Our method leverages the structure of B"uchia, which explicitly represent the semantics of automat- specifications, to learn policies conditioned on sequences of truth assignments that lead to satisfying the desired formulae.
arXiv Detail & Related papers (2024-10-06T21:30:38Z) - Directed Exploration in Reinforcement Learning from Linear Temporal Logic [59.707408697394534]
Linear temporal logic (LTL) is a powerful language for task specification in reinforcement learning.
We show that the synthesized reward signal remains fundamentally sparse, making exploration challenging.
We show how better exploration can be achieved by further leveraging the specification and casting its corresponding Limit Deterministic B"uchi Automaton (LDBA) as a Markov reward process.
arXiv Detail & Related papers (2024-08-18T14:25:44Z) - LinearAPT: An Adaptive Algorithm for the Fixed-Budget Thresholding
Linear Bandit Problem [4.666048091337632]
We present LinearAPT, a novel algorithm designed for the fixed budget setting of the Thresholding Linear Bandit (TLB) problem.
Our contributions highlight the adaptability, simplicity, and computational efficiency of LinearAPT, making it a valuable addition to the toolkit for addressing complex sequential decision-making challenges.
arXiv Detail & Related papers (2024-03-10T15:01:50Z) - Solving Multistage Stochastic Linear Programming via Regularized Linear
Decision Rules: An Application to Hydrothermal Dispatch Planning [77.34726150561087]
We propose a novel regularization scheme for linear decision rules (LDR) based on the AdaSO (adaptive least absolute shrinkage and selection operator)
Experiments show that the overfit threat is non-negligible when using the classical non-regularized LDR to solve MSLP.
For the LHDP problem, our analysis highlights the following benefits of the proposed framework in comparison to the non-regularized benchmark.
arXiv Detail & Related papers (2021-10-07T02:36:14Z) - Reinforcement Learning Based Temporal Logic Control with Maximum
Probabilistic Satisfaction [5.337302350000984]
This paper presents a model-free reinforcement learning algorithm to synthesize a control policy.
The effectiveness of the RL-based control synthesis is demonstrated via simulation and experimental results.
arXiv Detail & Related papers (2020-10-14T03:49:16Z) - Teaching the Old Dog New Tricks: Supervised Learning with Constraints [18.88930622054883]
Adding constraint support in Machine Learning has the potential to address outstanding issues in data-driven AI systems.
Existing approaches typically apply constrained optimization techniques to ML training, enforce constraint satisfaction by adjusting the model design, or use constraints to correct the output.
Here, we investigate a different, complementary, strategy based on "teaching" constraint satisfaction to a supervised ML method via the direct use of a state-of-the-art constraint solver.
arXiv Detail & Related papers (2020-02-25T09:47:39Z) - Certified Reinforcement Learning with Logic Guidance [78.2286146954051]
We propose a model-free RL algorithm that enables the use of Linear Temporal Logic (LTL) to formulate a goal for unknown continuous-state/action Markov Decision Processes (MDPs)
The algorithm is guaranteed to synthesise a control policy whose traces satisfy the specification with maximal probability.
arXiv Detail & Related papers (2019-02-02T20:09:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.