Related papers: Diverse Transformer Decoding for Offline Reinforcement Learning Using Financial Algorithmic Approaches

Diverse Transformer Decoding for Offline Reinforcement Learning Using Financial Algorithmic Approaches

URL: http://arxiv.org/abs/2502.10473v1
Date: Thu, 13 Feb 2025 15:51:46 GMT
Title: Diverse Transformer Decoding for Offline Reinforcement Learning Using Financial Algorithmic Approaches
Authors: Dan Elbaz, Oren Salzman,
Abstract summary: Portfolio Beam Search (PBS) is a simple-yet-effective alternative to Beam Search (BS)<n>We develop an uncertainty-aware diversification mechanism, which we integrate into a sequential decoding algorithm at inference time.<n>We empirically demonstrate the effectiveness of PBS on the D4RL benchmark, where it achieves higher returns and significantly reduces outcome variability.
Score: 4.364595470673757
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Offline Reinforcement Learning (RL) algorithms learn a policy using a fixed training dataset, which is then deployed online to interact with the environment and make decisions. Transformers, a standard choice for modeling time-series data, are gaining popularity in offline RL. In this context, Beam Search (BS), an approximate inference algorithm, is the go-to decoding method. Offline RL eliminates the need for costly or risky online data collection. However, the restricted dataset induces uncertainty as the agent may encounter unfamiliar sequences of states and actions during execution that were not covered in the training data. In this context, BS lacks two important properties essential for offline RL: It does not account for the aforementioned uncertainty, and its greedy left-right search approach often results in sequences with minimal variations, failing to explore potentially better alternatives. To address these limitations, we propose Portfolio Beam Search (PBS), a simple-yet-effective alternative to BS that balances exploration and exploitation within a Transformer model during decoding. We draw inspiration from financial economics and apply these principles to develop an uncertainty-aware diversification mechanism, which we integrate into a sequential decoding algorithm at inference time. We empirically demonstrate the effectiveness of PBS on the D4RL locomotion benchmark, where it achieves higher returns and significantly reduces outcome variability.

Related papers

Robust Offline Reinforcement Learning for Non-Markovian Decision Processes [48.9399496805422]
We study the learning problem of robust offline non-Markovian RL.<n>We introduce a novel dataset distillation and a lower confidence bound (LCB) design for robust values under different types of the uncertainty set.<n>By further introducing a novel type-I concentrability coefficient tailored for offline low-rank non-Markovian decision processes, we prove that our algorithm can find an $epsilon$-optimal robust policy.
arXiv Detail & Related papers (2024-11-12T03:22:56Z)
Tackling Data Corruption in Offline Reinforcement Learning via Sequence Modeling [35.2859997591196]
offline reinforcement learning holds promise for scaling data-driven decision-making.<n>However, real-world data collected from sensors or humans often contains noise and errors.<n>Our study reveals that prior research falls short under data corruption when the dataset is limited.
arXiv Detail & Related papers (2024-07-05T06:34:32Z)
Bridging Distributionally Robust Learning and Offline RL: An Approach to Mitigate Distribution Shift and Partial Data Coverage [32.578787778183546]
offline reinforcement learning (RL) algorithms learn optimal polices using historical (offline) data. One of the main challenges in offline RL is the distribution shift. We propose two offline RL algorithms using the distributionally robust learning (DRL) framework.
arXiv Detail & Related papers (2023-10-27T19:19:30Z)
Offline Policy Optimization in RL with Variance Regularizaton [142.87345258222942]
We propose variance regularization for offline RL algorithms, using stationary distribution corrections. We show that by using Fenchel duality, we can avoid double sampling issues for computing the gradient of the variance regularizer. The proposed algorithm for offline variance regularization (OVAR) can be used to augment any existing offline policy optimization algorithms.
arXiv Detail & Related papers (2022-12-29T18:25:01Z)
Wall Street Tree Search: Risk-Aware Planning for Offline Reinforcement Learning [8.089234432461804]
offline reinforcement-learning (RL) algorithms learn to make decisions using a given, fixed training dataset without the possibility of additional online data collection. This problem setting is captivating because it holds the promise of utilizing previously collected datasets without any costly or risky interaction with the environment. We present a simple-yet-highly-effective risk-aware planning algorithm for offline RL.
arXiv Detail & Related papers (2022-11-06T07:42:24Z)
Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning [125.8224674893018]
Offline Reinforcement Learning (RL) aims to learn policies from previously collected datasets without exploring the environment. Applying off-policy algorithms to offline RL usually fails due to the extrapolation error caused by the out-of-distribution (OOD) actions. We propose Pessimistic Bootstrapping for offline RL (PBRL), a purely uncertainty-driven offline algorithm without explicit policy constraints.
arXiv Detail & Related papers (2022-02-23T15:27:16Z)
The Least Restriction for Offline Reinforcement Learning [0.0]
We propose a creative offline reinforcement learning framework, the Least Restriction (LR) The LR regards selecting an action as taking a sample from the probability distribution. It is able to learn robustly from different offline datasets, including random and suboptimal demonstrations.
arXiv Detail & Related papers (2021-07-05T01:50:40Z)
OptiDICE: Offline Policy Optimization via Stationary Distribution Correction Estimation [59.469401906712555]
We present an offline reinforcement learning algorithm that prevents overestimation in a more principled way. Our algorithm, OptiDICE, directly estimates the stationary distribution corrections of the optimal policy. We show that OptiDICE performs competitively with the state-of-the-art methods.
arXiv Detail & Related papers (2021-06-21T00:43:30Z)
Critic Regularized Regression [70.8487887738354]
We propose a novel offline RL algorithm to learn policies from data using a form of critic-regularized regression (CRR) We find that CRR performs surprisingly well and scales to tasks with high-dimensional state and action spaces.
arXiv Detail & Related papers (2020-06-26T17:50:26Z)
MOPO: Model-based Offline Policy Optimization [183.6449600580806]
offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data. We show that an existing model-based RL algorithm already produces significant gains in the offline setting. We propose to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics.
arXiv Detail & Related papers (2020-05-27T08:46:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.