Related papers: Return Augmented Decision Transformer for Off-Dynamics Reinforcement Learning

Return Augmented Decision Transformer for Off-Dynamics Reinforcement Learning

URL: http://arxiv.org/abs/2410.23450v1
Date: Wed, 30 Oct 2024 20:46:26 GMT
Title: Return Augmented Decision Transformer for Off-Dynamics Reinforcement Learning
Authors: Ruhan Wang, Yu Yang, Zhishuai Liu, Dongruo Zhou, Pan Xu,
Abstract summary: We study offline off-dynamics reinforcement learning (RL) to enhance policy learning in a target domain with limited data. Our approach centers on return-conditioned supervised learning (RCSL), particularly focusing on the decision transformer (DT) We propose the Return Augmented Decision Transformer (RADT) method, where we augment the return in the source domain by aligning its distribution with that in the target domain.
Score: 26.915055027485465
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study offline off-dynamics reinforcement learning (RL) to utilize data from an easily accessible source domain to enhance policy learning in a target domain with limited data. Our approach centers on return-conditioned supervised learning (RCSL), particularly focusing on the decision transformer (DT), which can predict actions conditioned on desired return guidance and complete trajectory history. Previous works tackle the dynamics shift problem by augmenting the reward in the trajectory from the source domain to match the optimal trajectory in the target domain. However, this strategy can not be directly applicable in RCSL owing to (1) the unique form of the RCSL policy class, which explicitly depends on the return, and (2) the absence of a straightforward representation of the optimal trajectory distribution. We propose the Return Augmented Decision Transformer (RADT) method, where we augment the return in the source domain by aligning its distribution with that in the target domain. We provide the theoretical analysis demonstrating that the RCSL policy learned from RADT achieves the same level of suboptimality as would be obtained without a dynamics shift. We introduce two practical implementations RADT-DARA and RADT-MV respectively. Extensive experiments conducted on D4RL datasets reveal that our methods generally outperform dynamic programming based methods in off-dynamics RL scenarios.

Related papers

ODRL: A Benchmark for Off-Dynamics Reinforcement Learning [59.72217833812439]
We introduce ODRL, the first benchmark tailored for evaluating off-dynamics RL methods. ODRL contains four experimental settings where the source and target domains can be either online or offline. We conduct extensive benchmarking experiments, which show that no method has universal advantages across varied dynamics shifts.
arXiv Detail & Related papers (2024-10-28T05:29:38Z)
Q-value Regularized Decision ConvFormer for Offline Reinforcement Learning [5.398202201395825]
Decision Transformer (DT) has demonstrated exceptional capabilities in offline reinforcement learning. Decision ConvFormer (DC) is easier to understand in the context of modeling RL trajectories within a Markov Decision Process. We propose the Q-value Regularized Decision ConvFormer (QDC), which combines the understanding of RL trajectories by DC and incorporates a term that maximizes action values.
arXiv Detail & Related papers (2024-09-12T14:10:22Z)
Q-value Regularized Transformer for Offline Reinforcement Learning [70.13643741130899]
We propose a Q-value regularized Transformer (QT) to enhance the state-of-the-art in offline reinforcement learning (RL) QT learns an action-value function and integrates a term maximizing action-values into the training loss of Conditional Sequence Modeling (CSM) Empirical evaluations on D4RL benchmark datasets demonstrate the superiority of QT over traditional DP and CSM methods.
arXiv Detail & Related papers (2024-05-27T12:12:39Z)
High-Precision Geosteering via Reinforcement Learning and Particle Filters [0.0]
Geosteering is a key component of drilling operations and traditionally involves manual interpretation of various data sources such as well-log data. Academic attempts to solve geosteering decision optimization with greedy optimization and Approximate Dynamic Programming (ADP) showed promise but lacked adaptivity to realistic diverse scenarios. We propose Reinforcement learning (RL) to facilitate optimal decision-making through reward-based iterative learning.
arXiv Detail & Related papers (2024-02-09T12:54:34Z)
DiffTORI: Differentiable Trajectory Optimization for Deep Reinforcement and Imitation Learning [19.84386060857712]
This paper introduces DiffTORI, which utilizes Differentiable Trajectory optimization as the policy representation to generate actions for deep Reinforcement and Imitation learning. Across 15 model-based RL tasks and 35 imitation learning tasks with high-dimensional image and point cloud inputs, DiffTORI outperforms prior state-of-the-art methods in both domains.
arXiv Detail & Related papers (2024-02-08T05:26:40Z)
Critic-Guided Decision Transformer for Offline Reinforcement Learning [28.211835303617118]
Critic-Guided Decision Transformer (CGDT) Uses predictability of long-term returns from value-based methods with the trajectory modeling capability of the Decision Transformer. Builds upon these insights, we propose a novel approach, which combines the predictability of long-term returns from value-based methods with the trajectory modeling capability of the Decision Transformer.
arXiv Detail & Related papers (2023-12-21T10:29:17Z)
Rethinking Decision Transformer via Hierarchical Reinforcement Learning [54.3596066989024]
Decision Transformer (DT) is an innovative algorithm leveraging recent advances of the transformer architecture in reinforcement learning (RL) We introduce a general sequence modeling framework for studying sequential decision making through the lens of Hierarchical RL. We show DT emerges as a special case of this framework with certain choices of high-level and low-level policies, and discuss the potential failure of these choices.
arXiv Detail & Related papers (2023-11-01T03:32:13Z)
PGrad: Learning Principal Gradients For Domain Generalization [14.134043376245165]
We develop a novel DG training strategy, we call PGrad, to learn a robust gradient direction, improving models' generalization ability on unseen domains. PGrad's gradient design forces the DG training to ignore domain-dependent noise signals and updates all training domains with a robust direction. PGrad achieves competitive results across seven datasets, demonstrating its efficacy across both synthetic and real-world distributional shifts.
arXiv Detail & Related papers (2023-05-02T00:48:24Z)
Boosting Offline Reinforcement Learning via Data Rebalancing [104.3767045977716]
offline reinforcement learning (RL) is challenged by the distributional shift between learning policies and datasets. We propose a simple yet effective method to boost offline RL algorithms based on the observation that resampling a dataset keeps the distribution support unchanged. We dub our method ReD (Return-based Data Rebalance), which can be implemented with less than 10 lines of code change and adds negligible running time.
arXiv Detail & Related papers (2022-10-17T16:34:01Z)
Towards Accurate Knowledge Transfer via Target-awareness Representation Disentanglement [56.40587594647692]
We propose a novel transfer learning algorithm, introducing the idea of Target-awareness REpresentation Disentanglement (TRED) TRED disentangles the relevant knowledge with respect to the target task from the original source model and used as a regularizer during fine-tuning the target model. Experiments on various real world datasets show that our method stably improves the standard fine-tuning by more than 2% in average.
arXiv Detail & Related papers (2020-10-16T17:45:08Z)
Optimization-driven Deep Reinforcement Learning for Robust Beamforming in IRS-assisted Wireless Communications [54.610318402371185]
Intelligent reflecting surface (IRS) is a promising technology to assist downlink information transmissions from a multi-antenna access point (AP) to a receiver. We minimize the AP's transmit power by a joint optimization of the AP's active beamforming and the IRS's passive beamforming. We propose a deep reinforcement learning (DRL) approach that can adapt the beamforming strategies from past experiences.
arXiv Detail & Related papers (2020-05-25T01:42:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.