Related papers: Offline Stochastic Shortest Path: Learning, Evaluation and Towards Optimality

Offline Stochastic Shortest Path: Learning, Evaluation and Towards Optimality

URL: http://arxiv.org/abs/2206.04921v1
Date: Fri, 10 Jun 2022 07:44:56 GMT
Title: Offline Stochastic Shortest Path: Learning, Evaluation and Towards Optimality
Authors: Ming Yin, Wenjing Chen, Mengdi Wang and Yu-Xiang Wang
Abstract summary: In this paper, we consider the offline shortest path problem when the state space and the action space are finite. We design the simple value-based algorithms for tackling both offline policy evaluation (OPE) and offline policy learning tasks. Our analysis of these simple algorithms yields strong instance-dependent bounds which can imply worst-case bounds that are near-minimax optimal.
Score: 57.91411772725183
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Goal-oriented Reinforcement Learning, where the agent needs to reach the goal state while simultaneously minimizing the cost, has received significant attention in real-world applications. Its theoretical formulation, stochastic shortest path (SSP), has been intensively researched in the online setting. Nevertheless, it remains understudied when such an online interaction is prohibited and only historical data is provided. In this paper, we consider the offline stochastic shortest path problem when the state space and the action space are finite. We design the simple value iteration-based algorithms for tackling both offline policy evaluation (OPE) and offline policy learning tasks. Notably, our analysis of these simple algorithms yields strong instance-dependent bounds which can imply worst-case bounds that are near-minimax optimal. We hope our study could help illuminate the fundamental statistical limits of the offline SSP problem and motivate further studies beyond the scope of current consideration.

Related papers

Test-time Offline Reinforcement Learning on Goal-related Experience [50.94457794664909]
Research in foundation models has shown that performance can be substantially improved through test-time training.<n>We propose a novel self-supervised data selection criterion, which selects transitions from an offline dataset according to their relevance to the current state.<n>Our goal-conditioned test-time training (GC-TTT) algorithm applies this routine in a receding-horizon fashion during evaluation, adapting the policy to the current trajectory as it is being rolled out.
arXiv Detail & Related papers (2025-07-24T21:11:39Z)
Learning to Cover: Online Learning and Optimization with Irreversible Decisions [50.5775508521174]
We find that regret grows sub-linearly at a rate $Thetaleft(mfrac12cdotfrac11-2-Tright)$, thus converging exponentially fast to $Theta(sqrtm)$. These findings underscore the benefits of limited online learning and optimization, in that even a few rounds can provide significant benefits as compared to a no-learning baseline.
arXiv Detail & Related papers (2024-06-20T23:00:25Z)
Bayesian Design Principles for Offline-to-Online Reinforcement Learning [50.97583504192167]
offline-to-online fine-tuning is crucial for real-world applications where exploration can be costly or unsafe. In this paper, we tackle the dilemma of offline-to-online fine-tuning: if the agent remains pessimistic, it may fail to learn a better policy, while if it becomes optimistic directly, performance may suffer from a sudden drop. We show that Bayesian design principles are crucial in solving such a dilemma.
arXiv Detail & Related papers (2024-05-31T16:31:07Z)
State-Constrained Offline Reinforcement Learning [9.38848713730931]
We introduce a novel framework named emphstate-constrained offline reinforcement learning. Our framework significantly enhances learning potential and reduces previous limitations. We also introduce StaCQ, a deep learning algorithm that is both performance-driven on the D4RL benchmark datasets and closely aligned with our theoretical propositions.
arXiv Detail & Related papers (2024-05-23T09:50:04Z)
Trajectory-Oriented Policy Optimization with Sparse Rewards [2.9602904918952695]
We introduce an approach leveraging offline demonstration trajectories for swifter and more efficient online RL in environments with sparse rewards. Our pivotal insight involves treating offline demonstration trajectories as guidance, rather than mere imitation. We then illustrate that this optimization problem can be streamlined into a policy-gradient algorithm, integrating rewards shaped by insights from offline demonstrations.
arXiv Detail & Related papers (2024-01-04T12:21:01Z)
Goal-conditioned Offline Reinforcement Learning through State Space Partitioning [9.38848713730931]
offline reinforcement learning (RL) aims to infer sequential decision policies using only offline datasets. We argue that, despite its benefits, this approach is still insufficient to fully address the distribution shift and multi-modality problems. We propose a complementary advantage-based weighting scheme that introduces an additional source of inductive bias.
arXiv Detail & Related papers (2023-03-16T14:52:53Z)
Offline Policy Optimization with Eligible Actions [34.4530766779594]
offline policy optimization could have a large impact on many real-world decision-making problems. Importance sampling and its variants are a commonly used type of estimator in offline policy evaluation. We propose an algorithm to avoid this overfitting through a new per-state-neighborhood normalization constraint.
arXiv Detail & Related papers (2022-07-01T19:18:15Z)
Pessimism in the Face of Confounders: Provably Efficient Offline Reinforcement Learning in Partially Observable Markov Decision Processes [99.26864533035454]
We study offline reinforcement learning (RL) in partially observable Markov decision processes. We propose the underlineProxy variable underlinePessimistic underlinePolicy underlineOptimization (textttP3O) algorithm. textttP3O is the first provably efficient offline RL algorithm for POMDPs with a confounded dataset.
arXiv Detail & Related papers (2022-05-26T19:13:55Z)
A Sharp Characterization of Linear Estimators for Offline Policy Evaluation [33.37672297925897]
offline policy evaluation is a fundamental statistical problem in reinforcement learning. We identify simple control-theoretic and linear-algebraic conditions that are necessary and sufficient for classical methods. Our results provide a complete picture of the behavior of linear estimators for offline policy evaluation.
arXiv Detail & Related papers (2022-03-08T17:52:57Z)
Reinforcement Learning with Sparse Rewards using Guidance from Offline Demonstration [9.017416068706579]
A major challenge in real-world reinforcement learning (RL) is the sparsity of reward feedback. We develop an algorithm that exploits the offline demonstration data generated by a sub-optimal behavior policy. We demonstrate the superior performance of our algorithm over state-of-the-art approaches.
arXiv Detail & Related papers (2022-02-09T18:45:40Z)
Policy Gradient for Continuing Tasks in Non-stationary Markov Decision Processes [112.38662246621969]
Reinforcement learning considers the problem of finding policies that maximize an expected cumulative reward in a Markov decision process with unknown transition probabilities. We compute unbiased navigation gradients of the value function which we use as ascent directions to update the policy. A major drawback of policy gradient-type algorithms is that they are limited to episodic tasks unless stationarity assumptions are imposed.
arXiv Detail & Related papers (2020-10-16T15:15:42Z)
EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL [48.552287941528]
Off-policy reinforcement learning holds the promise of sample-efficient learning of decision-making policies. In the offline RL setting, standard off-policy RL methods can significantly underperform. We introduce Expected-Max Q-Learning (EMaQ), which is more closely related to the resulting practical algorithm.
arXiv Detail & Related papers (2020-07-21T21:13:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.