Related papers: Goal-Conditioned Predictive Coding for Offline Reinforcement Learning

Goal-Conditioned Predictive Coding for Offline Reinforcement Learning

URL: http://arxiv.org/abs/2307.03406v2
Date: Sat, 28 Oct 2023 19:42:50 GMT
Title: Goal-Conditioned Predictive Coding for Offline Reinforcement Learning
Authors: Zilai Zeng, Ce Zhang, Shijie Wang, Chen Sun
Abstract summary: We investigate whether sequence modeling has the ability to condense trajectories into useful representations that enhance policy learning. We introduce Goal-Conditioned Predictive Coding, a sequence modeling objective that yields powerful trajectory representations and leads to performant policies.
Score: 24.300131097275298
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent work has demonstrated the effectiveness of formulating decision making as supervised learning on offline-collected trajectories. Powerful sequence models, such as GPT or BERT, are often employed to encode the trajectories. However, the benefits of performing sequence modeling on trajectory data remain unclear. In this work, we investigate whether sequence modeling has the ability to condense trajectories into useful representations that enhance policy learning. We adopt a two-stage framework that first leverages sequence models to encode trajectory-level representations, and then learns a goal-conditioned policy employing the encoded representations as its input. This formulation allows us to consider many existing supervised offline RL methods as specific instances of our framework. Within this framework, we introduce Goal-Conditioned Predictive Coding (GCPC), a sequence modeling objective that yields powerful trajectory representations and leads to performant policies. Through extensive empirical evaluations on AntMaze, FrankaKitchen and Locomotion environments, we observe that sequence modeling can have a significant impact on challenging decision making tasks. Furthermore, we demonstrate that GCPC learns a goal-conditioned latent representation encoding the future trajectory, which enables competitive performance on all three benchmarks.

Related papers

Zero-Shot Whole-Body Humanoid Control via Behavioral Foundation Models [71.34520793462069]
Unsupervised reinforcement learning (RL) aims at pre-training agents that can solve a wide range of downstream tasks in complex environments. We introduce a novel algorithm regularizing unsupervised RL towards imitating trajectories from unlabeled behavior datasets. We demonstrate the effectiveness of this new approach in a challenging humanoid control problem.
arXiv Detail & Related papers (2025-04-15T10:41:11Z)
AdaCred: Adaptive Causal Decision Transformers with Feature Crediting [11.54181863246064]
We introduce AdaCred, a novel approach that represents trajectories as causal graphs built from short-term action-reward-state sequences. Our experiments demonstrate that AdaCred-based policies require shorter trajectory sequences and consistently outperform conventional methods in both offline reinforcement learning and imitation learning environments.
arXiv Detail & Related papers (2024-12-19T22:22:37Z)
Causality-Enhanced Behavior Sequence Modeling in LLMs for Personalized Recommendation [47.29682938439268]
We propose a novel Counterfactual Fine-Tuning (CFT) method to improve user preference modeling. We employ counterfactual reasoning to identify the causal effects of behavior sequences on model output. Experiments on real-world datasets demonstrate that CFT effectively improves behavior sequence modeling.
arXiv Detail & Related papers (2024-10-30T08:41:13Z)
Q-value Regularized Transformer for Offline Reinforcement Learning [70.13643741130899]
We propose a Q-value regularized Transformer (QT) to enhance the state-of-the-art in offline reinforcement learning (RL) QT learns an action-value function and integrates a term maximizing action-values into the training loss of Conditional Sequence Modeling (CSM) Empirical evaluations on D4RL benchmark datasets demonstrate the superiority of QT over traditional DP and CSM methods.
arXiv Detail & Related papers (2024-05-27T12:12:39Z)
Stitching Sub-Trajectories with Conditional Diffusion Model for Goal-Conditioned Offline RL [18.31263353823447]
We propose a model-based offline Goal-Conditioned Reinforcement Learning (Offline GCRL) method to acquire diverse goal-oriented skills. In this paper, we use the diffusion model that generates future plans conditioned on the target goal and value, with the target value estimated from the goal-relabeled offline dataset. We report state-of-the-art performance in the standard benchmark set of GCRL tasks, and demonstrate the capability to successfully stitch the segments of suboptimal trajectories in the offline data to generate high-quality plans.
arXiv Detail & Related papers (2024-02-11T15:23:13Z)
Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks. We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level. We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z)
A Tractable Inference Perspective of Offline RL [36.563229330549284]
A popular paradigm for offline Reinforcement Learning (RL) tasks is to first fit the offline trajectories to a sequence model, and then prompt the model for actions that lead to high expected return. This paper highlights that tractability, the ability to exactly and efficiently answer various probabilistic queries, plays an important role in offline RL. We propose Trifle, which bridges the gap between good sequence models and high expected returns at evaluation time.
arXiv Detail & Related papers (2023-10-31T19:16:07Z)
Multi-Objective Decision Transformers for Offline Reinforcement Learning [7.386356540208436]
offline RL is structured to derive policies from static trajectory data without requiring real-time environment interactions. We reformulate offline RL as a multi-objective optimization problem, where prediction is extended to states and returns. Our experiments on D4RL benchmark locomotion tasks reveal that our propositions allow for more effective utilization of the attention mechanism in the transformer model.
arXiv Detail & Related papers (2023-08-31T00:47:58Z)
Statistically Efficient Variance Reduction with Double Policy Estimation for Off-Policy Evaluation in Sequence-Modeled Reinforcement Learning [53.97273491846883]
We propose DPE: an RL algorithm that blends offline sequence modeling and offline reinforcement learning with Double Policy Estimation. We validate our method in multiple tasks of OpenAI Gym with D4RL benchmarks.
arXiv Detail & Related papers (2023-08-28T20:46:07Z)
When Demonstrations Meet Generative World Models: A Maximum Likelihood Framework for Offline Inverse Reinforcement Learning [62.00672284480755]
This paper aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent. Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving.
arXiv Detail & Related papers (2023-02-15T04:14:20Z)
Variational Latent Branching Model for Off-Policy Evaluation [23.073461349048834]
We propose a variational latent branching model (VLBM) to learn the transition function of Markov decision processes (MDPs) We introduce the branching architecture to improve the model's robustness against randomly model weights. We show that the VLBM outperforms existing state-of-the-art OPE methods in general.
arXiv Detail & Related papers (2023-01-28T02:20:03Z)
Discrete Factorial Representations as an Abstraction for Goal Conditioned Reinforcement Learning [99.38163119531745]
We show that applying a discretizing bottleneck can improve performance in goal-conditioned RL setups. We experimentally prove the expected return on out-of-distribution goals, while still allowing for specifying goals with expressive structure.
arXiv Detail & Related papers (2022-11-01T03:31:43Z)
Target-Embedding Autoencoders for Supervised Representation Learning [111.07204912245841]
This paper analyzes a framework for improving generalization in a purely supervised setting, where the target space is high-dimensional. We motivate and formalize the general framework of target-embedding autoencoders (TEA) for supervised prediction, learning intermediate latent representations jointly optimized to be both predictable from features as well as predictive of targets.
arXiv Detail & Related papers (2020-01-23T02:37:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.