Multi-Objective Decision Transformers for Offline Reinforcement Learning
- URL: http://arxiv.org/abs/2308.16379v1
- Date: Thu, 31 Aug 2023 00:47:58 GMT
- Title: Multi-Objective Decision Transformers for Offline Reinforcement Learning
- Authors: Abdelghani Ghanem, Philippe Ciblat, Mounir Ghogho
- Abstract summary: offline RL is structured to derive policies from static trajectory data without requiring real-time environment interactions.
We reformulate offline RL as a multi-objective optimization problem, where prediction is extended to states and returns.
Our experiments on D4RL benchmark locomotion tasks reveal that our propositions allow for more effective utilization of the attention mechanism in the transformer model.
- Score: 7.386356540208436
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Offline Reinforcement Learning (RL) is structured to derive policies from
static trajectory data without requiring real-time environment interactions.
Recent studies have shown the feasibility of framing offline RL as a sequence
modeling task, where the sole aim is to predict actions based on prior context
using the transformer architecture. However, the limitation of this single task
learning approach is its potential to undermine the transformer model's
attention mechanism, which should ideally allocate varying attention weights
across different tokens in the input context for optimal prediction. To address
this, we reformulate offline RL as a multi-objective optimization problem,
where the prediction is extended to states and returns. We also highlight a
potential flaw in the trajectory representation used for sequence modeling,
which could generate inaccuracies when modeling the state and return
distributions. This is due to the non-smoothness of the action distribution
within the trajectory dictated by the behavioral policy. To mitigate this
issue, we introduce action space regions to the trajectory representation. Our
experiments on D4RL benchmark locomotion tasks reveal that our propositions
allow for more effective utilization of the attention mechanism in the
transformer model, resulting in performance that either matches or outperforms
current state-of-the art methods.
Related papers
- Q-value Regularized Transformer for Offline Reinforcement Learning [70.13643741130899]
We propose a Q-value regularized Transformer (QT) to enhance the state-of-the-art in offline reinforcement learning (RL)
QT learns an action-value function and integrates a term maximizing action-values into the training loss of Conditional Sequence Modeling (CSM)
Empirical evaluations on D4RL benchmark datasets demonstrate the superiority of QT over traditional DP and CSM methods.
arXiv Detail & Related papers (2024-05-27T12:12:39Z) - Model-Based Reinforcement Learning with Multi-Task Offline Pretraining [59.82457030180094]
We present a model-based RL method that learns to transfer potentially useful dynamics and action demonstrations from offline data to a novel task.
The main idea is to use the world models not only as simulators for behavior learning but also as tools to measure the task relevance.
We demonstrate the advantages of our approach compared with the state-of-the-art methods in Meta-World and DeepMind Control Suite.
arXiv Detail & Related papers (2023-06-06T02:24:41Z) - Learning Representative Trajectories of Dynamical Systems via
Domain-Adaptive Imitation [0.0]
We propose DATI, a deep reinforcement learning agent designed for domain-adaptive trajectory imitation.
Our experiments show that DATI outperforms baseline methods for imitation learning and optimal control in this setting.
Its generalization to a real-world scenario is shown through the discovery of abnormal motion patterns in maritime traffic.
arXiv Detail & Related papers (2023-04-19T15:53:48Z) - Graph Decision Transformer [83.76329715043205]
Graph Decision Transformer (GDT) is a novel offline reinforcement learning approach.
GDT models the input sequence into a causal graph to capture potential dependencies between fundamentally different concepts.
Our experiments show that GDT matches or surpasses the performance of state-of-the-art offline RL methods on image-based Atari and OpenAI Gym.
arXiv Detail & Related papers (2023-03-07T09:10:34Z) - On Transforming Reinforcement Learning by Transformer: The Development
Trajectory [97.79247023389445]
Transformer, originally devised for natural language processing, has also attested significant success in computer vision.
We group existing developments in two categories: architecture enhancement and trajectory optimization.
We examine the main applications of TRL in robotic manipulation, text-based games, navigation and autonomous driving.
arXiv Detail & Related papers (2022-12-29T03:15:59Z) - INFOrmation Prioritization through EmPOWERment in Visual Model-Based RL [90.06845886194235]
We propose a modified objective for model-based reinforcement learning (RL)
We integrate a term inspired by variational empowerment into a state-space model based on mutual information.
We evaluate the approach on a suite of vision-based robot control tasks with natural video backgrounds.
arXiv Detail & Related papers (2022-04-18T23:09:23Z) - Transfer RL across Observation Feature Spaces via Model-Based
Regularization [9.660642248872973]
In many reinforcement learning (RL) applications, the observation space is specified by human developers and restricted by physical realizations.
We propose a novel algorithm which extracts the latent-space dynamics in the source task, and transfers the dynamics model to the target task.
Our algorithm works for drastic changes of observation space without any inter-task mapping or any prior knowledge of the target task.
arXiv Detail & Related papers (2022-01-01T22:41:19Z) - Learning to Reweight Imaginary Transitions for Model-Based Reinforcement
Learning [58.66067369294337]
When the model is inaccurate or biased, imaginary trajectories may be deleterious for training the action-value and policy functions.
We adaptively reweight the imaginary transitions, so as to reduce the negative effects of poorly generated trajectories.
Our method outperforms state-of-the-art model-based and model-free RL algorithms on multiple tasks.
arXiv Detail & Related papers (2021-04-09T03:13:35Z) - PerSim: Data-Efficient Offline Reinforcement Learning with Heterogeneous
Agents via Personalized Simulators [19.026312915461553]
We propose a model-based offline reinforcement learning (RL) approach called PerSim.
We first learn a personalized simulator for each agent by collectively using the historical trajectories across all agents prior to learning a policy.
This representation suggests a simple, regularized neural network architecture to effectively learn the transition dynamics per agent, even with scarce, offline data.
arXiv Detail & Related papers (2021-02-13T17:16:41Z) - Learning Off-Policy with Online Planning [18.63424441772675]
We investigate a novel instantiation of H-step lookahead with a learned model and a terminal value function.
We show the flexibility of LOOP to incorporate safety constraints during deployment with a set of navigation environments.
arXiv Detail & Related papers (2020-08-23T16:18:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.