Related papers: Learning Logic Specifications for Policy Guidance in POMDPs: an Inductive Logic Programming Approach

Learning Logic Specifications for Policy Guidance in POMDPs: an Inductive Logic Programming Approach

URL: http://arxiv.org/abs/2402.19265v1
Date: Thu, 29 Feb 2024 15:36:01 GMT
Title: Learning Logic Specifications for Policy Guidance in POMDPs: an Inductive Logic Programming Approach
Authors: Daniele Meli, Alberto Castellini, Alessandro Farinelli
Abstract summary: We learn high-quality traces from POMDP executions generated by any solver. We exploit data- and time-efficient Indu Logic Programming (ILP) to generate interpretable belief-based policy specifications. We show that learneds expressed in Answer Set Programming (ASP) yield performance superior to neural networks and similar to optimal handcrafted task-specifics within lower computational time.
Score: 57.788675205519986
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Partially Observable Markov Decision Processes (POMDPs) are a powerful framework for planning under uncertainty. They allow to model state uncertainty as a belief probability distribution. Approximate solvers based on Monte Carlo sampling show great success to relax the computational demand and perform online planning. However, scaling to complex realistic domains with many actions and long planning horizons is still a major challenge, and a key point to achieve good performance is guiding the action-selection process with domain-dependent policy heuristics which are tailored for the specific application domain. We propose to learn high-quality heuristics from POMDP traces of executions generated by any solver. We convert the belief-action pairs to a logical semantics, and exploit data- and time-efficient Inductive Logic Programming (ILP) to generate interpretable belief-based policy specifications, which are then used as online heuristics. We evaluate thoroughly our methodology on two notoriously challenging POMDP problems, involving large action spaces and long planning horizons, namely, rocksample and pocman. Considering different state-of-the-art online POMDP solvers, including POMCP, DESPOT and AdaOPS, we show that learned heuristics expressed in Answer Set Programming (ASP) yield performance superior to neural networks and similar to optimal handcrafted task-specific heuristics within lower computational time. Moreover, they well generalize to more challenging scenarios not experienced in the training phase (e.g., increasing rocks and grid size in rocksample, incrementing the size of the map and the aggressivity of ghosts in pocman).

Related papers

Scalable Decision-Making in Stochastic Environments through Learned Temporal Abstraction [7.918703013303246]
We present Latent Macro Action Planner (L-MAP), which addresses the challenge of learning to make decisions in high-dimensional continuous action spaces. L-MAP learns a set of temporally extended macro-actions through a state-conditional Vector Quantized Variational Autoencoder (VQ-VAE) In offline RL settings, including continuous control tasks, L-MAP efficiently searches over discrete latent actions to yield high expected returns.
arXiv Detail & Related papers (2025-02-28T16:02:23Z)
On Sequential Fault-Intolerant Process Planning [60.66853798340345]
We propose and study a planning problem we call Sequential Fault-Intolerant Process Planning (SFIPP) SFIPP captures a reward structure common in many sequential multi-stage decision problems where the planning is deemed successful only if all stages succeed. We design provably tight online algorithms for settings in which we need to pick between different actions with unknown success chances at each stage.
arXiv Detail & Related papers (2025-02-07T15:20:35Z)
Spatial Reasoning and Planning for Deep Embodied Agents [2.7195102129095003]
This thesis explores the development of data-driven techniques for spatial reasoning and planning tasks. It focuses on enhancing learning efficiency, interpretability, and transferability across novel scenarios.
arXiv Detail & Related papers (2024-09-28T23:05:56Z)
Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks. We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level. We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z)
Rollout Heuristics for Online Stochastic Contingent Planning [6.185979230964809]
Partially Observable Monte-Carlo Planning is an online algorithm for deciding on the next action to perform. POMDP is highly dependent on the rollout policy to compute good estimates. In this paper, we model POMDPs as contingent planning problems.
arXiv Detail & Related papers (2023-10-03T18:24:47Z)
Provably Efficient UCB-type Algorithms For Learning Predictive State Representations [55.00359893021461]
The sequential decision-making problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs) This paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models. In contrast to existing approaches for PSRs, our UCB-type algorithms enjoy computational tractability, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy.
arXiv Detail & Related papers (2023-07-01T18:35:21Z)
Learning Logic Specifications for Soft Policy Guidance in POMCP [71.69251176275638]
Partially Observable Monte Carlo Planning (POMCP) is an efficient solver for Partially Observable Markov Decision Processes (POMDPs) POMCP suffers from sparse reward function, namely, rewards achieved only when the final goal is reached. In this paper, we use inductive logic programming to learn logic specifications from traces of POMCP executions.
arXiv Detail & Related papers (2023-03-16T09:37:10Z)
Task-Guided IRL in POMDPs that Scales [22.594913269327353]
In inverse linear reinforcement learning (IRL), a learning agent infers a reward function encoding the underlying task using demonstrations from experts. Most IRL techniques require the computationally forward problem -- computing an optimal policy given a reward function -- in POMDPs. We develop an algorithm that reduces the information while increasing the data efficiency.
arXiv Detail & Related papers (2022-12-30T21:08:57Z)
Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm. We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z)
Visual Learning-based Planning for Continuous High-Dimensional POMDPs [81.16442127503517]
Visual Tree Search (VTS) is a learning and planning procedure that combines generative models learned offline with online model-based POMDP planning. VTS bridges offline model training and online planning by utilizing a set of deep generative observation models to predict and evaluate the likelihood of image observations in a Monte Carlo tree search planner. We show that VTS is robust to different observation noises and, since it utilizes online, model-based planning, can adapt to different reward structures without the need to re-train.
arXiv Detail & Related papers (2021-12-17T11:53:31Z)
Compositional Reinforcement Learning from Logical Specifications [21.193231846438895]
Recent approaches automatically generate a reward function from a given specification and use a suitable reinforcement learning algorithm to learn a policy. We develop a compositional learning approach, called DiRL, that interleaves high-level planning and reinforcement learning. Our approach then incorporates reinforcement learning to learn neural network policies for each edge (sub-task) within a Dijkstra-style planning algorithm to compute a high-level plan in the graph.
arXiv Detail & Related papers (2021-06-25T22:54:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.