Related papers: LTLf Adaptive Synthesis for Multi-Tier Goals in Nondeterministic Domains

LTLf Adaptive Synthesis for Multi-Tier Goals in Nondeterministic Domains

URL: http://arxiv.org/abs/2504.20983v1
Date: Tue, 29 Apr 2025 17:53:16 GMT
Title: LTLf Adaptive Synthesis for Multi-Tier Goals in Nondeterministic Domains
Authors: Giuseppe De Giacomo, Gianmarco Parretti, Shufang Zhu,
Abstract summary: We study a variant of synthesisf synthesis that synthesizes adaptive strategies for achieving a multi-tier goal.<n>We provide a game-theoretic technique to compute adaptive strategies that is sound and complete.
Score: 24.117872352200948
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study a variant of LTLf synthesis that synthesizes adaptive strategies for achieving a multi-tier goal, consisting of multiple increasingly challenging LTLf objectives in nondeterministic planning domains. Adaptive strategies are strategies that at any point of their execution (i) enforce the satisfaction of as many objectives as possible in the multi-tier goal, and (ii) exploit possible cooperation from the environment to satisfy as many as possible of the remaining ones. This happens dynamically: if the environment cooperates (ii) and an objective becomes enforceable (i), then our strategies will enforce it. We provide a game-theoretic technique to compute adaptive strategies that is sound and complete. Notably, our technique is polynomial, in fact quadratic, in the number of objectives. In other words, it handles multi-tier goals with only a minor overhead compared to standard LTLf synthesis.

Related papers

Gradient-Adaptive Policy Optimization: Towards Multi-Objective Alignment of Large Language Models [19.559468441956714]
Reinforcement Learning from Human Feedback has emerged as a powerful technique for aligning large language models with human preferences.<n>We frame human value alignment as a multi-objective optimization problem, aiming to maximize a set of potentially conflicting objectives.<n>We introduce Gradient-Adaptive Policy Optimization (GAPO), a novel fine-tuning paradigm that employs multiple-gradient descent to align LLMs with diverse preference distributions.
arXiv Detail & Related papers (2025-07-02T17:25:26Z)
Collab: Controlled Decoding using Mixture of Agents for LLM Alignment [90.6117569025754]
Reinforcement learning from human feedback has emerged as an effective technique to align Large Language models. Controlled Decoding provides a mechanism for aligning a model at inference time without retraining. We propose a mixture of agent-based decoding strategies leveraging the existing off-the-shelf aligned LLM policies.
arXiv Detail & Related papers (2025-03-27T17:34:25Z)
EPO: Explicit Policy Optimization for Strategic Reasoning in LLMs via Reinforcement Learning [69.55982246413046]
We propose explicit policy optimization (EPO) for strategic reasoning.<n>EPO provides strategies in open-ended action space and can be plugged into arbitrary LLM agents to motivate goal-directed behavior.<n> Experiments across social and physical domains demonstrate EPO's ability of long-term goal alignment.
arXiv Detail & Related papers (2025-02-18T03:15:55Z)
Rethinking Multi-Objective Learning through Goal-Conditioned Supervised Learning [8.593384839118658]
Multi-objective learning aims to optimize multiple objectives simultaneously with a single model.<n>It suffers from the difficulty to formalize and conduct the exact learning process.<n>We propose a general framework for automatically learning to achieve multiple objectives based on the existing sequential data.
arXiv Detail & Related papers (2024-12-12T03:47:40Z)
Conditional Language Policy: A General Framework for Steerable Multi-Objective Finetuning [72.46388818127105]
Conditional Language Policy (CLP) is a framework for finetuning language models on multiple objectives. We show that CLP learns steerable models that effectively trade-off conflicting objectives at inference time.
arXiv Detail & Related papers (2024-07-22T16:13:38Z)
Conflict-Averse Gradient Aggregation for Constrained Multi-Objective Reinforcement Learning [13.245000585002858]
In many real-world applications, a reinforcement learning (RL) agent should consider multiple objectives and adhere to safety guidelines. We propose a constrained multi-objective gradient aggregation algorithm named Constrained Multi-Objective Gradient Aggregator (CoGAMO)
arXiv Detail & Related papers (2024-03-01T04:57:13Z)
TOP-Training: Target-Oriented Pretraining for Medical Extractive Question Answering [53.92585020805746]
We study extractive question-answering in the medical domain (Medical-EQA) This problem has two main challenges: (i) domain specificity, and (ii) extraction-based answering style. We propose TOP-Training, a target-oriented pre-training paradigm.
arXiv Detail & Related papers (2023-10-25T20:48:16Z)
LTLf Best-Effort Synthesis in Nondeterministic Planning Domains [27.106071554421664]
We study best-effort strategies (aka plans) in fully observable nondeterministic domains (FOND) We present a game-theoretic synthesis technique for synthesizing best-effort strategies that exploit the specificity of nondeterministic planning domains.
arXiv Detail & Related papers (2023-08-29T10:10:41Z)
Imitating Graph-Based Planning with Goal-Conditioned Policies [72.61631088613048]
We present a self-imitation scheme which distills a subgoal-conditioned policy into the target-goal-conditioned policy. We empirically show that our method can significantly boost the sample-efficiency of the existing goal-conditioned RL methods.
arXiv Detail & Related papers (2023-03-20T14:51:10Z)
Planning to Practice: Efficient Online Fine-Tuning by Composing Goals in Latent Space [76.46113138484947]
General-purpose robots require diverse repertoires of behaviors to complete challenging tasks in real-world unstructured environments. To address this issue, goal-conditioned reinforcement learning aims to acquire policies that can reach goals for a wide range of tasks on command. We propose Planning to Practice, a method that makes it practical to train goal-conditioned policies for long-horizon tasks.
arXiv Detail & Related papers (2022-05-17T06:58:17Z)
From STL Rulebooks to Rewards [4.859570041295978]
We propose a principled approach to shaping rewards for reinforcement learning from multiple objectives. We first equip STL with a novel quantitative semantics allowing to automatically evaluate individual requirements. We then develop a method for systematically combining evaluations of multiple requirements into a single reward.
arXiv Detail & Related papers (2021-10-06T14:16:59Z)
Goal Kernel Planning: Linearly-Solvable Non-Markovian Policies for Logical Tasks with Goal-Conditioned Options [54.40780660868349]
We introduce a compositional framework called Linearly-Solvable Goal Kernel Dynamic Programming (LS-GKDP)<n>LS-GKDP combines the Linearly-Solvable Markov Decision Process (LMDP) formalism with the Options Framework of Reinforcement Learning.<n>We show how an LMDP with a goal kernel enables the efficient optimization of meta-policies in a lower-dimensional subspace defined by the task grounding.
arXiv Detail & Related papers (2020-07-06T05:13:20Z)
A Distributional View on Multi-Objective Policy Optimization [24.690800846837273]
We propose an algorithm for multi-objective reinforcement learning that enables setting desired preferences for objectives in a scale-invariant way. We show that setting different preferences in our framework allows us to trace out the space of nondominated solutions.
arXiv Detail & Related papers (2020-05-15T13:02:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.