Related papers: LTLf Best-Effort Synthesis in Nondeterministic Planning Domains

Related papers

Reinforced Reasoning for Embodied Planning [18.40186665383579]
Embodied planning requires agents to make coherent multi-step decisions based on dynamic visual observations and natural language goals.<n>We introduce a reinforcement fine-tuning framework that brings R1-style reasoning enhancement into embodied planning.
arXiv Detail & Related papers (2025-05-28T07:21:37Z)
Exploiting Symbolic Heuristics for the Synthesis of Domain-Specific Temporal Planning Guidance using Reinforcement Learning [51.54559117314768]
Recent work investigated the use of Reinforcement Learning (RL) for the synthesis of guidance to improve the performance of temporal planners.<n>We propose an evolution of this learning and planning framework that focuses on exploiting the information provided by symbolics during both the RL and planning phases.
arXiv Detail & Related papers (2025-05-19T17:19:13Z)
LTLf Adaptive Synthesis for Multi-Tier Goals in Nondeterministic Domains [24.117872352200948]
We study a variant of synthesisf synthesis that synthesizes adaptive strategies for achieving a multi-tier goal. We provide a game-theoretic technique to compute adaptive strategies that is sound and complete.
arXiv Detail & Related papers (2025-04-29T17:53:16Z)
Global-Decision-Focused Neural ODEs for Proactive Grid Resilience Management [50.34345101758248]
We propose predict-all-then-optimize-globally (PATOG), a framework that integrates outage prediction with globally optimized interventions. Our approach ensures spatially and temporally coherent decision-making, improving both predictive accuracy and operational efficiency. Experiments on synthetic and real-world datasets demonstrate significant improvements in outage prediction consistency and grid resilience.
arXiv Detail & Related papers (2025-02-25T16:15:35Z)
EPO: Explicit Policy Optimization for Strategic Reasoning in LLMs via Reinforcement Learning [69.55982246413046]
We propose explicit policy optimization (EPO) for strategic reasoning. EPO provides strategies in open-ended action space and can be plugged into arbitrary LLM agents to motivate goal-directed behavior. Experiments across social and physical domains demonstrate EPO's ability of long-term goal alignment.
arXiv Detail & Related papers (2025-02-18T03:15:55Z)
LLM-Generated Heuristics for AI Planning: Do We Even Need Domain-Independence Anymore? [87.71321254733384]
Large language models (LLMs) can generate planning approaches tailored to specific planning problems. LLMs can achieve state-of-the-art performance on some standard IPC domains. We discuss whether these results signify a paradigm shift and how they can complement existing planning approaches.
arXiv Detail & Related papers (2025-01-30T22:21:12Z)
Chasing Progress, Not Perfection: Revisiting Strategies for End-to-End LLM Plan Generation [34.636688162807836]
This study reassesses recent strategies by developing an end-to-end LLM planner. We find that fine-tuning LLMs on a corpus of planning instances does not lead to robust planning skills. Various strategies, including Chain-of-Thought, do enhance the probability of a plan being executable.
arXiv Detail & Related papers (2024-12-14T04:23:14Z)
Exploring and Benchmarking the Planning Capabilities of Large Language Models [57.23454975238014]
This work lays the foundations for improving planning capabilities of large language models (LLMs) We construct a comprehensive benchmark suite encompassing both classical planning benchmarks and natural language scenarios. We investigate the use of many-shot in-context learning to enhance LLM planning, exploring the relationship between increased context length and improved planning performance.
arXiv Detail & Related papers (2024-06-18T22:57:06Z)
Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems. In common practice, convergence (hyper)policies are learned only to deploy their deterministic version. We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z)
Simple Hierarchical Planning with Diffusion [54.48129192534653]
Diffusion-based generative methods have proven effective in modeling trajectories with offline datasets. We introduce the Hierarchical diffuser, a fast, yet surprisingly effective planning method combining the advantages of hierarchical and diffusion-based planning. Our model adopts a "jumpy" planning strategy at the higher level, which allows it to have a larger receptive field but at a lower computational cost.
arXiv Detail & Related papers (2024-01-05T05:28:40Z)
Abstraction of Nondeterministic Situation Calculus Action Theories -- Extended Version [23.24285208243607]
We develop a general framework for abstracting the behavior of an agent that operates in a nondeterministic domain. We assume that we have both an abstract and a concrete nondeterministic basic action theory. We show that if the agent has a (strong FOND) plan/strategy to achieve a goal/complete a task at the abstract level, and it can always execute the nondeterministic abstract actions to completion at the concrete level.
arXiv Detail & Related papers (2023-05-20T05:42:38Z)
Imitating Graph-Based Planning with Goal-Conditioned Policies [72.61631088613048]
We present a self-imitation scheme which distills a subgoal-conditioned policy into the target-goal-conditioned policy. We empirically show that our method can significantly boost the sample-efficiency of the existing goal-conditioned RL methods.
arXiv Detail & Related papers (2023-03-20T14:51:10Z)
When Demonstrations Meet Generative World Models: A Maximum Likelihood Framework for Offline Inverse Reinforcement Learning [62.00672284480755]
This paper aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent. Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving.
arXiv Detail & Related papers (2023-02-15T04:14:20Z)
Exploration Policies for On-the-Fly Controller Synthesis: A Reinforcement Learning Approach [0.0]
We propose a new method for obtaining unboundeds based on Reinforcement Learning (RL) Our agents learn from scratch in a highly observable partially RL task and outperform existing overall, in instances unseen during training.
arXiv Detail & Related papers (2022-10-07T20:28:25Z)
Recognizing LTLf/PLTLf Goals in Fully Observable Non-Deterministic Domain Models [26.530274055506453]
Goal Recognition is the task of discerning the correct intended goal that an agent aims to achieve. We develop a novel approach that is capable of recognizing temporally extended goals.
arXiv Detail & Related papers (2021-03-22T09:46:03Z)
Robust Hierarchical Planning with Policy Delegation [6.1678491628787455]
We propose a novel framework and algorithm for hierarchical planning based on the principle of delegation. We show this planning approach is experimentally very competitive to classic planning and reinforcement learning techniques on a variety of domains.
arXiv Detail & Related papers (2020-10-25T04:36:20Z)
Near-Optimal Reactive Synthesis Incorporating Runtime Information [28.25296947005914]
We consider the problem of optimal reactive synthesis - compute a strategy that satisfies a mission specification in a dynamic environment. We incorporate task-critical information, that is only available at runtime, into the strategy synthesis in order to improve performance.
arXiv Detail & Related papers (2020-07-31T14:41:35Z)
Mixed Strategies for Robust Optimization of Unknown Objectives [93.8672371143881]
We consider robust optimization problems, where the goal is to optimize an unknown objective function against the worst-case realization of an uncertain parameter. We design a novel sample-efficient algorithm GP-MRO, which sequentially learns about the unknown objective from noisy point evaluations. GP-MRO seeks to discover a robust and randomized mixed strategy, that maximizes the worst-case expected objective value.
arXiv Detail & Related papers (2020-02-28T09:28:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.