Learning Logic Specifications for Policy Guidance in POMDPs: an
Inductive Logic Programming Approach
- URL: http://arxiv.org/abs/2402.19265v1
- Date: Thu, 29 Feb 2024 15:36:01 GMT
- Title: Learning Logic Specifications for Policy Guidance in POMDPs: an
Inductive Logic Programming Approach
- Authors: Daniele Meli, Alberto Castellini, Alessandro Farinelli
- Abstract summary: We learn high-quality traces from POMDP executions generated by any solver.
We exploit data- and time-efficient Indu Logic Programming (ILP) to generate interpretable belief-based policy specifications.
We show that learneds expressed in Answer Set Programming (ASP) yield performance superior to neural networks and similar to optimal handcrafted task-specifics within lower computational time.
- Score: 57.788675205519986
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Partially Observable Markov Decision Processes (POMDPs) are a powerful
framework for planning under uncertainty. They allow to model state uncertainty
as a belief probability distribution. Approximate solvers based on Monte Carlo
sampling show great success to relax the computational demand and perform
online planning. However, scaling to complex realistic domains with many
actions and long planning horizons is still a major challenge, and a key point
to achieve good performance is guiding the action-selection process with
domain-dependent policy heuristics which are tailored for the specific
application domain. We propose to learn high-quality heuristics from POMDP
traces of executions generated by any solver. We convert the belief-action
pairs to a logical semantics, and exploit data- and time-efficient Inductive
Logic Programming (ILP) to generate interpretable belief-based policy
specifications, which are then used as online heuristics. We evaluate
thoroughly our methodology on two notoriously challenging POMDP problems,
involving large action spaces and long planning horizons, namely, rocksample
and pocman. Considering different state-of-the-art online POMDP solvers,
including POMCP, DESPOT and AdaOPS, we show that learned heuristics expressed
in Answer Set Programming (ASP) yield performance superior to neural networks
and similar to optimal handcrafted task-specific heuristics within lower
computational time. Moreover, they well generalize to more challenging
scenarios not experienced in the training phase (e.g., increasing rocks and
grid size in rocksample, incrementing the size of the map and the aggressivity
of ghosts in pocman).
Related papers
- Spatial Reasoning and Planning for Deep Embodied Agents [2.7195102129095003]
This thesis explores the development of data-driven techniques for spatial reasoning and planning tasks.
It focuses on enhancing learning efficiency, interpretability, and transferability across novel scenarios.
arXiv Detail & Related papers (2024-09-28T23:05:56Z) - Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks.
We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level.
We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z) - Rollout Heuristics for Online Stochastic Contingent Planning [6.185979230964809]
Partially Observable Monte-Carlo Planning is an online algorithm for deciding on the next action to perform.
POMDP is highly dependent on the rollout policy to compute good estimates.
In this paper, we model POMDPs as contingent planning problems.
arXiv Detail & Related papers (2023-10-03T18:24:47Z) - Provably Efficient UCB-type Algorithms For Learning Predictive State
Representations [55.00359893021461]
The sequential decision-making problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs)
This paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models.
In contrast to existing approaches for PSRs, our UCB-type algorithms enjoy computational tractability, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy.
arXiv Detail & Related papers (2023-07-01T18:35:21Z) - Learning Logic Specifications for Soft Policy Guidance in POMCP [71.69251176275638]
Partially Observable Monte Carlo Planning (POMCP) is an efficient solver for Partially Observable Markov Decision Processes (POMDPs)
POMCP suffers from sparse reward function, namely, rewards achieved only when the final goal is reached.
In this paper, we use inductive logic programming to learn logic specifications from traces of POMCP executions.
arXiv Detail & Related papers (2023-03-16T09:37:10Z) - Task-Guided IRL in POMDPs that Scales [22.594913269327353]
In inverse linear reinforcement learning (IRL), a learning agent infers a reward function encoding the underlying task using demonstrations from experts.
Most IRL techniques require the computationally forward problem -- computing an optimal policy given a reward function -- in POMDPs.
We develop an algorithm that reduces the information while increasing the data efficiency.
arXiv Detail & Related papers (2022-12-30T21:08:57Z) - Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm.
We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z) - Visual Learning-based Planning for Continuous High-Dimensional POMDPs [81.16442127503517]
Visual Tree Search (VTS) is a learning and planning procedure that combines generative models learned offline with online model-based POMDP planning.
VTS bridges offline model training and online planning by utilizing a set of deep generative observation models to predict and evaluate the likelihood of image observations in a Monte Carlo tree search planner.
We show that VTS is robust to different observation noises and, since it utilizes online, model-based planning, can adapt to different reward structures without the need to re-train.
arXiv Detail & Related papers (2021-12-17T11:53:31Z) - Compositional Reinforcement Learning from Logical Specifications [21.193231846438895]
Recent approaches automatically generate a reward function from a given specification and use a suitable reinforcement learning algorithm to learn a policy.
We develop a compositional learning approach, called DiRL, that interleaves high-level planning and reinforcement learning.
Our approach then incorporates reinforcement learning to learn neural network policies for each edge (sub-task) within a Dijkstra-style planning algorithm to compute a high-level plan in the graph.
arXiv Detail & Related papers (2021-06-25T22:54:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.