AdaCred: Adaptive Causal Decision Transformers with Feature Crediting
- URL: http://arxiv.org/abs/2412.15427v1
- Date: Thu, 19 Dec 2024 22:22:37 GMT
- Title: AdaCred: Adaptive Causal Decision Transformers with Feature Crediting
- Authors: Hemant Kumawat, Saibal Mukhopadhyay,
- Abstract summary: We introduce AdaCred, a novel approach that represents trajectories as causal graphs built from short-term action-reward-state sequences.
Our experiments demonstrate that AdaCred-based policies require shorter trajectory sequences and consistently outperform conventional methods in both offline reinforcement learning and imitation learning environments.
- Score: 11.54181863246064
- License:
- Abstract: Reinforcement learning (RL) can be formulated as a sequence modeling problem, where models predict future actions based on historical state-action-reward sequences. Current approaches typically require long trajectory sequences to model the environment in offline RL settings. However, these models tend to over-rely on memorizing long-term representations, which impairs their ability to effectively attribute importance to trajectories and learned representations based on task-specific relevance. In this work, we introduce AdaCred, a novel approach that represents trajectories as causal graphs built from short-term action-reward-state sequences. Our model adaptively learns control policy by crediting and pruning low-importance representations, retaining only those most relevant for the downstream task. Our experiments demonstrate that AdaCred-based policies require shorter trajectory sequences and consistently outperform conventional methods in both offline reinforcement learning and imitation learning environments.
Related papers
- Are Expressive Models Truly Necessary for Offline RL? [18.425797519857113]
Sequential modeling requires capturing accurate dynamics across long horizons in trajectory data to ensure reasonable policy performance.
We show that lightweight models as simple as shallow 2-layers can enjoy accurate dynamics consistency and significantly reduced sequential modeling errors.
arXiv Detail & Related papers (2024-12-15T17:33:56Z) - Continual Task Learning through Adaptive Policy Self-Composition [54.95680427960524]
CompoFormer is a structure-based continual transformer model that adaptively composes previous policies via a meta-policy network.
Our experiments reveal that CompoFormer outperforms conventional continual learning (CL) methods, particularly in longer task sequences.
arXiv Detail & Related papers (2024-11-18T08:20:21Z) - Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization [51.34904967046097]
Continual learning seeks to overcome the challenge of catastrophic forgetting, where a model forgets previously learnt information.
We introduce a novel prior-based method that better constrains parameter growth, reducing catastrophic forgetting.
Results show that BAdam achieves state-of-the-art performance for prior-based methods on challenging single-headed class-incremental experiments.
arXiv Detail & Related papers (2023-09-15T17:10:51Z) - Multi-Objective Decision Transformers for Offline Reinforcement Learning [7.386356540208436]
offline RL is structured to derive policies from static trajectory data without requiring real-time environment interactions.
We reformulate offline RL as a multi-objective optimization problem, where prediction is extended to states and returns.
Our experiments on D4RL benchmark locomotion tasks reveal that our propositions allow for more effective utilization of the attention mechanism in the transformer model.
arXiv Detail & Related papers (2023-08-31T00:47:58Z) - Goal-Conditioned Predictive Coding for Offline Reinforcement Learning [24.300131097275298]
We investigate whether sequence modeling has the ability to condense trajectories into useful representations that enhance policy learning.
We introduce Goal-Conditioned Predictive Coding, a sequence modeling objective that yields powerful trajectory representations and leads to performant policies.
arXiv Detail & Related papers (2023-07-07T06:12:14Z) - Let Offline RL Flow: Training Conservative Agents in the Latent Space of
Normalizing Flows [58.762959061522736]
offline reinforcement learning aims to train a policy on a pre-recorded and fixed dataset without any additional environment interactions.
We build upon recent works on learning policies in latent action spaces and use a special form of Normalizing Flows for constructing a generative model.
We evaluate our method on various locomotion and navigation tasks, demonstrating that our approach outperforms recently proposed algorithms.
arXiv Detail & Related papers (2022-11-20T21:57:10Z) - Contrastive Value Learning: Implicit Models for Simple Offline RL [40.95632543012637]
We propose Contrastive Value Learning (CVL), which learns an implicit, multi-step model of the environment dynamics.
CVL can be learned without access to reward functions, but nonetheless can be used to directly estimate the value of each action.
Our experiments demonstrate that CVL outperforms prior offline RL methods on complex continuous control benchmarks.
arXiv Detail & Related papers (2022-11-03T19:10:05Z) - Reinforcement Learning as One Big Sequence Modeling Problem [84.84564880157149]
Reinforcement learning (RL) is typically concerned with estimating single-step policies or single-step models.
We view RL as a sequence modeling problem, with the goal being to predict a sequence of actions that leads to a sequence of high rewards.
arXiv Detail & Related papers (2021-06-03T17:58:51Z) - Model-Augmented Actor-Critic: Backpropagating through Paths [81.86992776864729]
Current model-based reinforcement learning approaches use the model simply as a learned black-box simulator.
We show how to make more effective use of the model by exploiting its differentiability.
arXiv Detail & Related papers (2020-05-16T19:18:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.