Related papers: Multi-Objective Decision Transformers for Offline Reinforcement Learning

Multi-Objective Decision Transformers for Offline Reinforcement Learning

URL: http://arxiv.org/abs/2308.16379v1
Date: Thu, 31 Aug 2023 00:47:58 GMT
Title: Multi-Objective Decision Transformers for Offline Reinforcement Learning
Authors: Abdelghani Ghanem, Philippe Ciblat, Mounir Ghogho
Abstract summary: offline RL is structured to derive policies from static trajectory data without requiring real-time environment interactions. We reformulate offline RL as a multi-objective optimization problem, where prediction is extended to states and returns. Our experiments on D4RL benchmark locomotion tasks reveal that our propositions allow for more effective utilization of the attention mechanism in the transformer model.
Score: 7.386356540208436
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Offline Reinforcement Learning (RL) is structured to derive policies from static trajectory data without requiring real-time environment interactions. Recent studies have shown the feasibility of framing offline RL as a sequence modeling task, where the sole aim is to predict actions based on prior context using the transformer architecture. However, the limitation of this single task learning approach is its potential to undermine the transformer model's attention mechanism, which should ideally allocate varying attention weights across different tokens in the input context for optimal prediction. To address this, we reformulate offline RL as a multi-objective optimization problem, where the prediction is extended to states and returns. We also highlight a potential flaw in the trajectory representation used for sequence modeling, which could generate inaccuracies when modeling the state and return distributions. This is due to the non-smoothness of the action distribution within the trajectory dictated by the behavioral policy. To mitigate this issue, we introduce action space regions to the trajectory representation. Our experiments on D4RL benchmark locomotion tasks reveal that our propositions allow for more effective utilization of the attention mechanism in the transformer model, resulting in performance that either matches or outperforms current state-of-the art methods.

Related papers

Distilling Reinforcement Learning Algorithms for In-Context Model-Based Planning [39.53836535326121]
We propose Distillation for In-Context Planning (DICP), an in-context model-based RL framework where Transformers simultaneously learn environment dynamics and improve policy in-context. Our results show that DICP achieves state-of-the-art performance while requiring significantly fewer environment interactions than baselines.
arXiv Detail & Related papers (2025-02-26T10:16:57Z)
AdaCred: Adaptive Causal Decision Transformers with Feature Crediting [11.54181863246064]
We introduce AdaCred, a novel approach that represents trajectories as causal graphs built from short-term action-reward-state sequences. Our experiments demonstrate that AdaCred-based policies require shorter trajectory sequences and consistently outperform conventional methods in both offline reinforcement learning and imitation learning environments.
arXiv Detail & Related papers (2024-12-19T22:22:37Z)
M$^3$PC: Test-time Model Predictive Control for Pretrained Masked Trajectory Model [14.779390462893298]
We propose using Model Predictive Control (MPC) at test time to leverage the model's own predictive capability to guide its action selection. MPC significantly improves the decision-making performance of a pretrained trajectory model without any additional parameter training. Our framework can be adapted to Offline to Online (O2O) RL and Goal Reaching RL.
arXiv Detail & Related papers (2024-12-07T14:44:22Z)
Q-value Regularized Transformer for Offline Reinforcement Learning [70.13643741130899]
We propose a Q-value regularized Transformer (QT) to enhance the state-of-the-art in offline reinforcement learning (RL) QT learns an action-value function and integrates a term maximizing action-values into the training loss of Conditional Sequence Modeling (CSM) Empirical evaluations on D4RL benchmark datasets demonstrate the superiority of QT over traditional DP and CSM methods.
arXiv Detail & Related papers (2024-05-27T12:12:39Z)
Model-Based Reinforcement Learning with Multi-Task Offline Pretraining [59.82457030180094]
We present a model-based RL method that learns to transfer potentially useful dynamics and action demonstrations from offline data to a novel task. The main idea is to use the world models not only as simulators for behavior learning but also as tools to measure the task relevance. We demonstrate the advantages of our approach compared with the state-of-the-art methods in Meta-World and DeepMind Control Suite.
arXiv Detail & Related papers (2023-06-06T02:24:41Z)
Learning Representative Trajectories of Dynamical Systems via Domain-Adaptive Imitation [0.0]
We propose DATI, a deep reinforcement learning agent designed for domain-adaptive trajectory imitation. Our experiments show that DATI outperforms baseline methods for imitation learning and optimal control in this setting. Its generalization to a real-world scenario is shown through the discovery of abnormal motion patterns in maritime traffic.
arXiv Detail & Related papers (2023-04-19T15:53:48Z)
Graph Decision Transformer [83.76329715043205]
Graph Decision Transformer (GDT) is a novel offline reinforcement learning approach. GDT models the input sequence into a causal graph to capture potential dependencies between fundamentally different concepts. Our experiments show that GDT matches or surpasses the performance of state-of-the-art offline RL methods on image-based Atari and OpenAI Gym.
arXiv Detail & Related papers (2023-03-07T09:10:34Z)
On Transforming Reinforcement Learning by Transformer: The Development Trajectory [97.79247023389445]
Transformer, originally devised for natural language processing, has also attested significant success in computer vision. We group existing developments in two categories: architecture enhancement and trajectory optimization. We examine the main applications of TRL in robotic manipulation, text-based games, navigation and autonomous driving.
arXiv Detail & Related papers (2022-12-29T03:15:59Z)
INFOrmation Prioritization through EmPOWERment in Visual Model-Based RL [90.06845886194235]
We propose a modified objective for model-based reinforcement learning (RL) We integrate a term inspired by variational empowerment into a state-space model based on mutual information. We evaluate the approach on a suite of vision-based robot control tasks with natural video backgrounds.
arXiv Detail & Related papers (2022-04-18T23:09:23Z)
Transfer RL across Observation Feature Spaces via Model-Based Regularization [9.660642248872973]
In many reinforcement learning (RL) applications, the observation space is specified by human developers and restricted by physical realizations. We propose a novel algorithm which extracts the latent-space dynamics in the source task, and transfers the dynamics model to the target task. Our algorithm works for drastic changes of observation space without any inter-task mapping or any prior knowledge of the target task.
arXiv Detail & Related papers (2022-01-01T22:41:19Z)
Learning to Reweight Imaginary Transitions for Model-Based Reinforcement Learning [58.66067369294337]
When the model is inaccurate or biased, imaginary trajectories may be deleterious for training the action-value and policy functions. We adaptively reweight the imaginary transitions, so as to reduce the negative effects of poorly generated trajectories. Our method outperforms state-of-the-art model-based and model-free RL algorithms on multiple tasks.
arXiv Detail & Related papers (2021-04-09T03:13:35Z)
PerSim: Data-Efficient Offline Reinforcement Learning with Heterogeneous Agents via Personalized Simulators [19.026312915461553]
We propose a model-based offline reinforcement learning (RL) approach called PerSim. We first learn a personalized simulator for each agent by collectively using the historical trajectories across all agents prior to learning a policy. This representation suggests a simple, regularized neural network architecture to effectively learn the transition dynamics per agent, even with scarce, offline data.
arXiv Detail & Related papers (2021-02-13T17:16:41Z)
Learning Off-Policy with Online Planning [18.63424441772675]
We investigate a novel instantiation of H-step lookahead with a learned model and a terminal value function. We show the flexibility of LOOP to incorporate safety constraints during deployment with a set of navigation environments.
arXiv Detail & Related papers (2020-08-23T16:18:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.