First-Order Representation Languages for Goal-Conditioned RL
- URL: http://arxiv.org/abs/2512.19355v1
- Date: Mon, 22 Dec 2025 12:54:32 GMT
- Title: First-Order Representation Languages for Goal-Conditioned RL
- Authors: Simon Ståhlberg, Hector Geffner,
- Abstract summary: We consider the use of first-order languages in goal-conditioned RL and generalized planning.<n>The question is how to learn goal-conditioned and general policies when the training instances are large.<n>We show that further performance gains can be achieved when states and goals are represented by sets of atoms.
- Score: 8.89493507314525
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: First-order relational languages have been used in MDP planning and reinforcement learning (RL) for two main purposes: specifying MDPs in compact form, and representing and learning policies that are general and not tied to specific instances or state spaces. In this work, we instead consider the use of first-order languages in goal-conditioned RL and generalized planning. The question is how to learn goal-conditioned and general policies when the training instances are large and the goal cannot be reached by random exploration alone. The technique of Hindsight Experience Replay (HER) provides an answer to this question: it relabels unsuccessful trajectories as successful ones by replacing the original goal with one that was actually achieved. If the target policy must generalize across states and goals, trajectories that do not reach the original goal states can enable more data- and time-efficient learning. In this work, we show that further performance gains can be achieved when states and goals are represented by sets of atoms. We consider three versions: goals as full states, goals as subsets of the original goals, and goals as lifted versions of these subgoals. The result is that the latter two successfully learn general policies on large planning instances with sparse rewards by automatically creating a curriculum of easier goals of increasing complexity. The experiments illustrate the computational gains of these versions, their limitations, and opportunities for addressing them.
Related papers
- Efficient Alignment of Unconditioned Action Prior for Language-conditioned Pick and Place in Clutter [59.69563889773648]
We study the task of language-conditioned pick and place in clutter, where a robot should grasp a target object in open clutter and move it to a specified place.<n>Some approaches learn end-to-end policies with features from vision foundation models, requiring large datasets.<n>We propose an action prior alignment method that aligns unconditioned action priors with 3D vision-language priors by learning one attention layer.
arXiv Detail & Related papers (2025-03-12T14:20:33Z) - Horizon Generalization in Reinforcement Learning [22.372738655730107]
We study goal-conditioned RL through the lens of generalization, but not in the traditional sense of random augmentations and domain randomization.<n>We show that this notion of horizon generalization is closely linked with invariance to planning.<n>A policy navigating towards a goal will select the same actions as if it were navigating to a waypoint en route to that goal. Thus, such a policy trained to reach nearby goals should succeed at reaching arbitrarily-distant goals.
arXiv Detail & Related papers (2025-01-06T01:42:46Z) - GOPlan: Goal-conditioned Offline Reinforcement Learning by Planning with Learned Models [31.628341050846768]
Goal-conditioned Offline Planning (GOPlan) is a novel model-based framework that contains two key phases.
GOPlan pretrains a prior policy capable of capturing multi-modal action distribution within the multi-goal dataset.
The reanalysis method generates high-quality imaginary data by planning with learned models for both intra-trajectory and inter-trajectory goals.
arXiv Detail & Related papers (2023-10-30T21:19:52Z) - HIQL: Offline Goal-Conditioned RL with Latent States as Actions [81.67963770528753]
We propose a hierarchical algorithm for goal-conditioned RL from offline data.
We show how this hierarchical decomposition makes our method robust to noise in the estimated value function.
Our method can solve long-horizon tasks that stymie prior methods, can scale to high-dimensional image observations, and can readily make use of action-free data.
arXiv Detail & Related papers (2023-07-22T00:17:36Z) - Imitating Graph-Based Planning with Goal-Conditioned Policies [72.61631088613048]
We present a self-imitation scheme which distills a subgoal-conditioned policy into the target-goal-conditioned policy.
We empirically show that our method can significantly boost the sample-efficiency of the existing goal-conditioned RL methods.
arXiv Detail & Related papers (2023-03-20T14:51:10Z) - Discrete Factorial Representations as an Abstraction for Goal
Conditioned Reinforcement Learning [99.38163119531745]
We show that applying a discretizing bottleneck can improve performance in goal-conditioned RL setups.
We experimentally prove the expected return on out-of-distribution goals, while still allowing for specifying goals with expressive structure.
arXiv Detail & Related papers (2022-11-01T03:31:43Z) - Planning to Practice: Efficient Online Fine-Tuning by Composing Goals in
Latent Space [76.46113138484947]
General-purpose robots require diverse repertoires of behaviors to complete challenging tasks in real-world unstructured environments.
To address this issue, goal-conditioned reinforcement learning aims to acquire policies that can reach goals for a wide range of tasks on command.
We propose Planning to Practice, a method that makes it practical to train goal-conditioned policies for long-horizon tasks.
arXiv Detail & Related papers (2022-05-17T06:58:17Z) - Bisimulation Makes Analogies in Goal-Conditioned Reinforcement Learning [71.52722621691365]
Building generalizable goal-conditioned agents from rich observations is a key to reinforcement learning (RL) solving real world problems.
We propose a new form of state abstraction called goal-conditioned bisimulation.
We learn this representation using a metric form of this abstraction, and show its ability to generalize to new goals in simulation manipulation tasks.
arXiv Detail & Related papers (2022-04-27T17:00:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.