Related papers: Learning to Ground Existentially Quantified Goals

Learning to Ground Existentially Quantified Goals

URL: http://arxiv.org/abs/2409.20259v1
Date: Mon, 30 Sep 2024 12:49:27 GMT
Title: Learning to Ground Existentially Quantified Goals
Authors: Martin Funkquist, Simon Ståhlberg, Hector Geffner,
Abstract summary: Goal instructions for autonomous AI agents cannot assume that objects have unique names. This raises problems in both classical planning and generalized planning. In this work, we address the goal grounding problem with a novel supervised learning approach.
Score: 10.343546104340962
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Goal instructions for autonomous AI agents cannot assume that objects have unique names. Instead, objects in goals must be referred to by providing suitable descriptions. However, this raises problems in both classical planning and generalized planning. The standard approach to handling existentially quantified goals in classical planning involves compiling them into a DNF formula that encodes all possible variable bindings and adding dummy actions to map each DNF term into the new, dummy goal. This preprocessing is exponential in the number of variables. In generalized planning, the problem is different: even if general policies can deal with any initial situation and goal, executing a general policy requires the goal to be grounded to define a value for the policy features. The problem of grounding goals, namely finding the objects to bind the goal variables, is subtle: it is a generalization of classical planning, which is a special case when there are no goal variables to bind, and constraint reasoning, which is a special case when there are no actions. In this work, we address the goal grounding problem with a novel supervised learning approach. A GNN architecture, trained to predict the cost of partially quantified goals over small domain instances is tested on larger instances involving more objects and different quantified goals. The proposed architecture is evaluated experimentally over several planning domains where generalization is tested along several dimensions including the number of goal variables and objects that can bind such variables. The scope of the approach is also discussed in light of the known relationship between GNNs and C2 logics.

Related papers

Context Matters! Relaxing Goals with LLMs for Feasible 3D Scene Planning [2.111102681327218]
We present an approach integrating classical planning with Large Language Models.<n>We propose a hierarchical formulation that enables robots to make unfeasible tasks tractable.<n>Our method demonstrates its ability to adapt and execute tasks effectively within environments modeled using 3D Scene Graphs.
arXiv Detail & Related papers (2025-06-18T19:14:56Z)
Multi-Objective Planning with Contextual Lexicographic Reward Preferences [5.207917381770368]
We present a framework that enables planning under varying lexicographic objective orderings, depending on context. In a CLMDP, both the objective ordering at a state and the associated reward functions are determined by the context. Our algorithm to solve a CLMDP first computes a policy for each objective ordering and then combines them into a single context-aware policy.
arXiv Detail & Related papers (2025-02-13T19:04:22Z)
Horizon Generalization in Reinforcement Learning [22.372738655730107]
We study goal-conditioned RL through the lens of generalization, but not in the traditional sense of random augmentations and domain randomization. We show that this notion of horizon generalization is closely linked with invariance to planning. A policy navigating towards a goal will select the same actions as if it were navigating to a waypoint en route to that goal. Thus, such a policy trained to reach nearby goals should succeed at reaching arbitrarily-distant goals.
arXiv Detail & Related papers (2025-01-06T01:42:46Z)
Rethinking Multi-Objective Learning through Goal-Conditioned Supervised Learning [8.593384839118658]
Multi-objective learning aims to optimize multiple objectives simultaneously with a single model. It suffers from the difficulty to formalize and conduct the exact learning process. We propose a general framework for automatically learning to achieve multiple objectives based on the existing sequential data.
arXiv Detail & Related papers (2024-12-12T03:47:40Z)
GOPlan: Goal-conditioned Offline Reinforcement Learning by Planning with Learned Models [31.628341050846768]
Goal-conditioned Offline Planning (GOPlan) is a novel model-based framework that contains two key phases. GOPlan pretrains a prior policy capable of capturing multi-modal action distribution within the multi-goal dataset. The reanalysis method generates high-quality imaginary data by planning with learned models for both intra-trajectory and inter-trajectory goals.
arXiv Detail & Related papers (2023-10-30T21:19:52Z)
HIQL: Offline Goal-Conditioned RL with Latent States as Actions [81.67963770528753]
We propose a hierarchical algorithm for goal-conditioned RL from offline data. We show how this hierarchical decomposition makes our method robust to noise in the estimated value function. Our method can solve long-horizon tasks that stymie prior methods, can scale to high-dimensional image observations, and can readily make use of action-free data.
arXiv Detail & Related papers (2023-07-22T00:17:36Z)
Discrete Factorial Representations as an Abstraction for Goal Conditioned Reinforcement Learning [99.38163119531745]
We show that applying a discretizing bottleneck can improve performance in goal-conditioned RL setups. We experimentally prove the expected return on out-of-distribution goals, while still allowing for specifying goals with expressive structure.
arXiv Detail & Related papers (2022-11-01T03:31:43Z)
TarGF: Learning Target Gradient Field for Object Rearrangement [8.49306925839127]
We focus on a more practical setting in object rearrangement, i.e., rearranging objects from shuffled layouts to a normative target distribution. It is hard to describe the target distribution (goal specification) for reward engineering or collect expert trajectories as demonstrations. We employ the score-matching objective to train a Target Gradient Field (TarGF), indicating a direction on each object to increase the likelihood of the target distribution.
arXiv Detail & Related papers (2022-09-02T07:20:34Z)
Goal-Conditioned Q-Learning as Knowledge Distillation [136.79415677706612]
We explore a connection between off-policy reinforcement learning in goal-conditioned settings and knowledge distillation. We empirically show that this can improve the performance of goal-conditioned off-policy reinforcement learning when the space of goals is high-dimensional. We also show that this technique can be adapted to allow for efficient learning in the case of multiple simultaneous sparse goals.
arXiv Detail & Related papers (2022-08-28T22:01:10Z)
Planning to Practice: Efficient Online Fine-Tuning by Composing Goals in Latent Space [76.46113138484947]
General-purpose robots require diverse repertoires of behaviors to complete challenging tasks in real-world unstructured environments. To address this issue, goal-conditioned reinforcement learning aims to acquire policies that can reach goals for a wide range of tasks on command. We propose Planning to Practice, a method that makes it practical to train goal-conditioned policies for long-horizon tasks.
arXiv Detail & Related papers (2022-05-17T06:58:17Z)
Learning Generalized Policies Without Supervision Using GNNs [20.322992960599255]
We consider the problem of learning generalized policies for classical planning domains using graph neural networks. We use a simple and general GNN architecture and aim at obtaining crisp experimental results. We exploit the relation established between the expressive power of GNNs and the $C_2$ fragment of first-order logic.
arXiv Detail & Related papers (2022-05-12T10:28:46Z)
Generative multitask learning mitigates target-causing confounding [61.21582323566118]
We propose a simple and scalable approach to causal representation learning for multitask learning. The improvement comes from mitigating unobserved confounders that cause the targets, but not the input. Our results on the Attributes of People and Taskonomy datasets reflect the conceptual improvement in robustness to prior probability shift.
arXiv Detail & Related papers (2022-02-08T20:42:14Z)
Learning General Policies from Small Examples Without Supervision [18.718037284357834]
Generalized planning is concerned with the computation of general policies that solve multiple instances of a planning domain all at once. It has been recently shown that these policies can be computed in two steps: first, a suitable abstraction in the form of a qualitative numerical planning problem (QNP) is learned from sample plans. In this work, we introduce an alternative approach for computing more expressive general policies which does not require sample plans or a QNP planner.
arXiv Detail & Related papers (2021-01-03T19:44:13Z)
Divide-and-Conquer Monte Carlo Tree Search For Goal-Directed Planning [78.65083326918351]
We consider alternatives to an implicit sequential planning assumption. We propose Divide-and-Conquer Monte Carlo Tree Search (DC-MCTS) for approximating the optimal plan. We show that this algorithmic flexibility over planning order leads to improved results in navigation tasks in grid-worlds.
arXiv Detail & Related papers (2020-04-23T18:08:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.