Beyond the Known: Decision Making with Counterfactual Reasoning Decision Transformer
- URL: http://arxiv.org/abs/2505.09114v1
- Date: Wed, 14 May 2025 03:45:16 GMT
- Title: Beyond the Known: Decision Making with Counterfactual Reasoning Decision Transformer
- Authors: Minh Hoang Nguyen, Linh Le Pham Van, Thommen George Karimpanal, Sunil Gupta, Hung Le,
- Abstract summary: Decision Transformers (DT) play a crucial role in modern reinforcement learning, leveraging offline datasets to achieve impressive results across various domains.<n>We propose the Counterfactual Reasoning Decision Transformer (CRDT), a novel framework inspired by counterfactual reasoning.
- Score: 29.029659384955206
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Decision Transformers (DT) play a crucial role in modern reinforcement learning, leveraging offline datasets to achieve impressive results across various domains. However, DT requires high-quality, comprehensive data to perform optimally. In real-world applications, the lack of training data and the scarcity of optimal behaviours make training on offline datasets challenging, as suboptimal data can hinder performance. To address this, we propose the Counterfactual Reasoning Decision Transformer (CRDT), a novel framework inspired by counterfactual reasoning. CRDT enhances DT ability to reason beyond known data by generating and utilizing counterfactual experiences, enabling improved decision-making in unseen scenarios. Experiments across Atari and D4RL benchmarks, including scenarios with limited data and altered dynamics, demonstrate that CRDT outperforms conventional DT approaches. Additionally, reasoning counterfactually allows the DT agent to obtain stitching abilities, combining suboptimal trajectories, without architectural modifications. These results highlight the potential of counterfactual reasoning to enhance reinforcement learning agents' performance and generalization capabilities.
Related papers
- Optimal Return-to-Go Guided Decision Transformer for Auto-Bidding in Advertisement [8.221810937147755]
We introduce the R* Decision Transformer (R* DT) to tackle the difficulties inherent in automated bidding.<n>R* DT stores actions based on state and return-to-go (RTG) value, as well as memorizing the RTG for a given state using a training set.<n> Comprehensive tests on a publicly available bidding dataset validate the R* DT's efficacy and highlight its superiority when dealing with mixed-quality trajectories.
arXiv Detail & Related papers (2025-06-27T06:56:54Z) - Diffusion Models for Smarter UAVs: Decision-Making and Modeling [15.093742222365156]
Unmanned Aerial Vehicles (UAVs) are increasingly adopted in modern communication networks.<n>However, challenges in decision-making and digital modeling continue to impede their rapid advancement.<n>This paper explores the integration of DMs with RL and DT to effectively address these challenges.
arXiv Detail & Related papers (2025-01-10T09:59:16Z) - Reinforcement Learning Gradients as Vitamin for Online Finetuning Decision Transformers [111.78179839856293]
Decision Transformers have emerged as a compelling paradigm for offline Reinforcement Learning (RL)
Online finetuning of decision transformers has been surprisingly under-explored.
We find that simply adding TD3 gradients to the finetuning process of ODT effectively improves the online finetuning performance of ODT.
arXiv Detail & Related papers (2024-10-31T16:38:51Z) - Predictive Coding for Decision Transformer [21.28952990360392]
Decision transformer (DT) architecture has shown promise across various domains.<n>Despite its initial success, DTs have underperformed on several challenging datasets in goal-conditioned RL.<n>We propose the Predictive Coding for Decision Transformer (PCDT) framework, which leverages generalized future conditioning to enhance DT methods.
arXiv Detail & Related papers (2024-10-04T13:17:34Z) - Making Large Language Models Better Planners with Reasoning-Decision Alignment [70.5381163219608]
We motivate an end-to-end decision-making model based on multimodality-augmented LLM.
We propose a reasoning-decision alignment constraint between the paired CoTs and planning results.
We dub our proposed large language planners with reasoning-decision alignment as RDA-Driver.
arXiv Detail & Related papers (2024-08-25T16:43:47Z) - Sustainable Diffusion-based Incentive Mechanism for Generative AI-driven Digital Twins in Industrial Cyber-Physical Systems [65.22300383287904]
Industrial Cyber-Physical Systems (ICPSs) are an integral component of modern manufacturing and industries.<n>By digitizing data throughout product life cycles, Digital Twins (DTs) in ICPSs enable a shift from current industrial infrastructures to intelligent and adaptive infrastructures.<n>GenAI can drive the construction and update of DTs to improve predictive accuracy and prepare for diverse smart manufacturing.
arXiv Detail & Related papers (2024-08-02T10:47:10Z) - Tackling Data Corruption in Offline Reinforcement Learning via Sequence Modeling [35.2859997591196]
offline reinforcement learning holds promise for scaling data-driven decision-making.<n>However, real-world data collected from sensors or humans often contains noise and errors.<n>Our study reveals that prior research falls short under data corruption when the dataset is limited.
arXiv Detail & Related papers (2024-07-05T06:34:32Z) - Solving Continual Offline Reinforcement Learning with Decision Transformer [78.59473797783673]
Continuous offline reinforcement learning (CORL) combines continuous and offline reinforcement learning.
Existing methods, employing Actor-Critic structures and experience replay (ER), suffer from distribution shifts, low efficiency, and weak knowledge-sharing.
We introduce multi-head DT (MH-DT) and low-rank adaptation DT (LoRA-DT) to mitigate DT's forgetting problem.
arXiv Detail & Related papers (2024-01-16T16:28:32Z) - Rethinking Decision Transformer via Hierarchical Reinforcement Learning [54.3596066989024]
Decision Transformer (DT) is an innovative algorithm leveraging recent advances of the transformer architecture in reinforcement learning (RL)
We introduce a general sequence modeling framework for studying sequential decision making through the lens of Hierarchical RL.
We show DT emerges as a special case of this framework with certain choices of high-level and low-level policies, and discuss the potential failure of these choices.
arXiv Detail & Related papers (2023-11-01T03:32:13Z) - Improving GANs with A Dynamic Discriminator [106.54552336711997]
We argue that a discriminator with an on-the-fly adjustment on its capacity can better accommodate such a time-varying task.
A comprehensive empirical study confirms that the proposed training strategy, termed as DynamicD, improves the synthesis performance without incurring any additional cost or training objectives.
arXiv Detail & Related papers (2022-09-20T17:57:33Z) - Q-learning Decision Transformer: Leveraging Dynamic Programming for
Conditional Sequence Modelling in Offline RL [0.0]
Decision Transformer (DT) combines the conditional policy approach and a transformer architecture.
DT lacks stitching ability -- one of the critical abilities for offline RL to learn the optimal policy.
We propose the Q-learning Decision Transformer (QDT) to address the shortcomings of DT.
arXiv Detail & Related papers (2022-09-08T18:26:39Z) - Augmentation-Aware Self-Supervision for Data-Efficient GAN Training [68.81471633374393]
Training generative adversarial networks (GANs) with limited data is challenging because the discriminator is prone to overfitting.
We propose a novel augmentation-aware self-supervised discriminator that predicts the augmentation parameter of the augmented data.
We compare our method with state-of-the-art (SOTA) methods using the class-conditional BigGAN and unconditional StyleGAN2 architectures.
arXiv Detail & Related papers (2022-05-31T10:35:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.