Wall Street Tree Search: Risk-Aware Planning for Offline Reinforcement
Learning
- URL: http://arxiv.org/abs/2211.04583v1
- Date: Sun, 6 Nov 2022 07:42:24 GMT
- Title: Wall Street Tree Search: Risk-Aware Planning for Offline Reinforcement
Learning
- Authors: Dan Elbaz and Gal Novik and Oren Salzman
- Abstract summary: offline reinforcement-learning (RL) algorithms learn to make decisions using a given, fixed training dataset without the possibility of additional online data collection.
This problem setting is captivating because it holds the promise of utilizing previously collected datasets without any costly or risky interaction with the environment.
We present a simple-yet-highly-effective risk-aware planning algorithm for offline RL.
- Score: 8.089234432461804
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Offline reinforcement-learning (RL) algorithms learn to make decisions using
a given, fixed training dataset without the possibility of additional online
data collection. This problem setting is captivating because it holds the
promise of utilizing previously collected datasets without any costly or risky
interaction with the environment. However, this promise also bears the drawback
of this setting. The restricted dataset induces subjective uncertainty because
the agent can encounter unfamiliar sequences of states and actions that the
training data did not cover. Moreover, inherent system stochasticity further
increases uncertainty and aggravates the offline RL problem, preventing the
agent from learning an optimal policy. To mitigate the destructive uncertainty
effects, we need to balance the aspiration to take reward-maximizing actions
with the incurred risk due to incorrect ones. In financial economics, modern
portfolio theory (MPT) is a method that risk-averse investors can use to
construct diversified portfolios that maximize their returns without
unacceptable levels of risk. We integrate MPT into the agent's decision-making
process to present a simple-yet-highly-effective risk-aware planning algorithm
for offline RL. Our algorithm allows us to systematically account for the
\emph{estimated quality} of specific actions and their \emph{estimated risk}
due to the uncertainty. We show that our approach can be coupled with the
Transformer architecture to yield a state-of-the-art planner for offline RL
tasks, maximizing the return while significantly reducing the variance.
Related papers
- OffRIPP: Offline RL-based Informative Path Planning [12.705099730591671]
IPP is a crucial task in robotics, where agents must design paths to gather valuable information about a target environment.
We propose an offline RL-based IPP framework that optimized information gain without requiring real-time interaction during training.
We validate the framework through extensive simulations and real-world experiments.
arXiv Detail & Related papers (2024-09-25T11:30:59Z) - Sparsity-based Safety Conservatism for Constrained Offline Reinforcement Learning [4.0847743592744905]
Reinforcement Learning (RL) has made notable success in decision-making fields like autonomous driving and robotic manipulation.
RL's training approach, centered on "on-policy" sampling, doesn't fully capitalize on data.
offline RL has emerged as a compelling alternative, particularly in conducting additional experiments is impractical.
arXiv Detail & Related papers (2024-07-17T20:57:05Z) - Distributional Reinforcement Learning with Online Risk-awareness
Adaption [5.363478475460403]
We introduce a novel framework, Distributional RL with Online Risk Adaption (DRL-ORA)
DRL-ORA dynamically selects the epistemic risk levels via solving a total variation minimization problem online.
We show multiple classes of tasks where DRL-ORA outperforms existing methods that rely on either a fixed risk level or manually predetermined risk level.
arXiv Detail & Related papers (2023-10-08T14:32:23Z) - Risk-Averse Reinforcement Learning via Dynamic Time-Consistent Risk
Measures [10.221369785560785]
In this paper, we consider the problem of maximizing dynamic risk of a sequence of rewards in Markov Decision Processes (MDPs)
Using a convex combination of expectation and conditional value-at-risk (CVaR) as a special one-step conditional risk measure, we reformulate the risk-averse MDP as a risk-neutral counterpart with augmented action space and manipulation on the immediate rewards.
Our numerical studies show that the risk-averse setting can reduce the variance and enhance robustness of the results.
arXiv Detail & Related papers (2023-01-14T21:43:18Z) - FIRE: A Failure-Adaptive Reinforcement Learning Framework for Edge Computing Migrations [52.85536740465277]
FIRE is a framework that adapts to rare events by training a RL policy in an edge computing digital twin environment.
We propose ImRE, an importance sampling-based Q-learning algorithm, which samples rare events proportionally to their impact on the value function.
We show that FIRE reduces costs compared to vanilla RL and the greedy baseline in the event of failures.
arXiv Detail & Related papers (2022-09-28T19:49:39Z) - Efficient Risk-Averse Reinforcement Learning [79.61412643761034]
In risk-averse reinforcement learning (RL), the goal is to optimize some risk measure of the returns.
We prove that under certain conditions this inevitably leads to a local-optimum barrier, and propose a soft risk mechanism to bypass it.
We demonstrate improved risk aversion in maze navigation, autonomous driving, and resource allocation benchmarks.
arXiv Detail & Related papers (2022-05-10T19:40:52Z) - Safe Online Bid Optimization with Return-On-Investment and Budget
Constraints subject to Uncertainty [87.81197574939355]
We study the nature of both the optimization and learning problems.
We provide an algorithm, namely GCB, guaranteeing sublinear regret at the cost of a potentially linear number of constraints violations.
More interestingly, we provide an algorithm, namely GCB_safe(psi,phi), guaranteeing both sublinear pseudo-regret and safety w.h.p. at the cost of accepting tolerances psi and phi.
arXiv Detail & Related papers (2022-01-18T17:24:20Z) - Conservative Offline Distributional Reinforcement Learning [34.95001490294207]
We propose Conservative Offline Distributional Actor Critic (CODAC) for both risk-neutral and risk-averse domains.
CODAC adapts distributional RL to the offline setting by penalizing the predicted quantiles of the return for out-of-distribution actions.
In experiments, CODAC successfully learns risk-averse policies using offline data collected purely from risk-neutral agents.
arXiv Detail & Related papers (2021-07-12T15:38:06Z) - Continuous Doubly Constrained Batch Reinforcement Learning [93.23842221189658]
We propose an algorithm for batch RL, where effective policies are learned using only a fixed offline dataset instead of online interactions with the environment.
The limited data in batch RL produces inherent uncertainty in value estimates of states/actions that were insufficiently represented in the training data.
We propose to mitigate this issue via two straightforward penalties: a policy-constraint to reduce this divergence and a value-constraint that discourages overly optimistic estimates.
arXiv Detail & Related papers (2021-02-18T08:54:14Z) - Critic Regularized Regression [70.8487887738354]
We propose a novel offline RL algorithm to learn policies from data using a form of critic-regularized regression (CRR)
We find that CRR performs surprisingly well and scales to tasks with high-dimensional state and action spaces.
arXiv Detail & Related papers (2020-06-26T17:50:26Z) - MOPO: Model-based Offline Policy Optimization [183.6449600580806]
offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data.
We show that an existing model-based RL algorithm already produces significant gains in the offline setting.
We propose to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics.
arXiv Detail & Related papers (2020-05-27T08:46:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.