A maximum-entropy approach to off-policy evaluation in average-reward
MDPs
- URL: http://arxiv.org/abs/2006.12620v1
- Date: Wed, 17 Jun 2020 18:13:37 GMT
- Title: A maximum-entropy approach to off-policy evaluation in average-reward
MDPs
- Authors: Nevena Lazic, Dong Yin, Mehrdad Farajtabar, Nir Levine, Dilan Gorur,
Chris Harris, Dale Schuurmans
- Abstract summary: This work focuses on off-policy evaluation (OPE) with function approximation in infinite-horizon undiscounted Markov decision processes (MDPs)
We provide the first finite-sample OPE error bound, extending existing results beyond the episodic and discounted cases.
We show that this results in an exponential-family distribution whose sufficient statistics are the features, paralleling maximum-entropy approaches in supervised learning.
- Score: 54.967872716145656
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work focuses on off-policy evaluation (OPE) with function approximation
in infinite-horizon undiscounted Markov decision processes (MDPs). For MDPs
that are ergodic and linear (i.e. where rewards and dynamics are linear in some
known features), we provide the first finite-sample OPE error bound, extending
existing results beyond the episodic and discounted cases. In a more general
setting, when the feature dynamics are approximately linear and for arbitrary
rewards, we propose a new approach for estimating stationary distributions with
function approximation. We formulate this problem as finding the
maximum-entropy distribution subject to matching feature expectations under
empirical dynamics. We show that this results in an exponential-family
distribution whose sufficient statistics are the features, paralleling
maximum-entropy approaches in supervised learning. We demonstrate the
effectiveness of the proposed OPE approaches in multiple environments.
Related papers
- Maximum a Posteriori Estimation for Linear Structural Dynamics Models Using Bayesian Optimization with Rational Polynomial Chaos Expansions [0.01578888899297715]
We propose an extension to an existing sparse Bayesian learning approach for MAP estimation.
We introduce a Bayesian optimization approach, which allows to adaptively enrich the experimental design.
By combining the sparsity-inducing learning procedure with the experimental design, we effectively reduce the number of model evaluations.
arXiv Detail & Related papers (2024-08-07T06:11:37Z) - Multi-objective Reinforcement Learning with Nonlinear Preferences: Provable Approximation for Maximizing Expected Scalarized Return [1.3162012586770577]
We study multi-objective reinforcement learning with nonlinear preferences over trajectories.
We derive an extended form of Bellman optimality for nonlinear optimization.
We show that there can be a substantial gap between the optimal policy computed by our algorithm and alternative baselines.
arXiv Detail & Related papers (2023-11-05T02:11:07Z) - Efficient expectation propagation for posterior approximation in
high-dimensional probit models [1.433758865948252]
We focus on the expectation propagation (EP) approximation of the posterior distribution in Bayesian probit regression.
We show how to leverage results on the extended multivariate skew-normal distribution to derive an efficient implementation of the EP routine.
This makes EP computationally feasible also in challenging high-dimensional settings, as shown in a detailed simulation study.
arXiv Detail & Related papers (2023-09-04T14:07:19Z) - Truncating Trajectories in Monte Carlo Reinforcement Learning [48.97155920826079]
In Reinforcement Learning (RL), an agent acts in an unknown environment to maximize the expected cumulative discounted sum of an external reward signal.
We propose an a-priori budget allocation strategy that leads to the collection of trajectories of different lengths.
We show that an appropriate truncation of the trajectories can succeed in improving performance.
arXiv Detail & Related papers (2023-05-07T19:41:57Z) - Sparse high-dimensional linear regression with a partitioned empirical
Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression.
Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates.
The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z) - Under-Approximating Expected Total Rewards in POMDPs [68.8204255655161]
We consider the optimal expected total reward to reach a goal state in a partially observable Markov decision process (POMDP)
We use mixed-integer linear programming (MILP) to find such minimal probability shifts and experimentally show that our techniques scale quite well.
arXiv Detail & Related papers (2022-01-21T16:43:03Z) - Variance-Aware Off-Policy Evaluation with Linear Function Approximation [85.75516599931632]
We study the off-policy evaluation problem in reinforcement learning with linear function approximation.
We propose an algorithm, VA-OPE, which uses the estimated variance of the value function to reweight the Bellman residual in Fitted Q-Iteration.
arXiv Detail & Related papers (2021-06-22T17:58:46Z) - Finite Sample Analysis of Minimax Offline Reinforcement Learning:
Completeness, Fast Rates and First-Order Efficiency [83.02999769628593]
We offer a theoretical characterization of off-policy evaluation (OPE) in reinforcement learning.
We show that the minimax approach enables us to achieve a fast rate of convergence for weights and quality functions.
We present the first finite-sample result with first-order efficiency in non-tabular environments.
arXiv Detail & Related papers (2021-02-05T03:20:39Z) - Distributed Stochastic Nonconvex Optimization and Learning based on
Successive Convex Approximation [26.11677569331688]
We introduce a novel framework for the distributed algorithmic minimization of the sum of the sum of the agents in a network.
We show that the proposed method can be applied to distributed neural networks.
arXiv Detail & Related papers (2020-04-30T15:36:46Z) - Communication-Efficient Distributed Estimator for Generalized Linear
Models with a Diverging Number of Covariates [7.427903819459701]
A novel method is proposed to obtain anally efficient estimator for large-scale distributed data by two rounds of communication.
In this novel method, the assumption on the number of servers is more relaxed and thus practical for real-world applications.
arXiv Detail & Related papers (2020-01-17T08:51:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.