Related papers: Exponential Family Model-Based Reinforcement Learning via Score Matching

Exponential Family Model-Based Reinforcement Learning via Score Matching

URL: http://arxiv.org/abs/2112.14195v1
Date: Tue, 28 Dec 2021 15:51:07 GMT
Title: Exponential Family Model-Based Reinforcement Learning via Score Matching
Authors: Gene Li, Junbo Li, Nathan Srebro, Zhaoran Wang, Zhuoran Yang
Abstract summary: We propose an optimistic model-based algorithm, dubbed SMRL, for finitehorizon episodic reinforcement learning (RL) SMRL uses score matching, an unnormalized density estimation technique that enables efficient estimation of the model parameter by ridge regression.
Score: 97.31477125728844
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose an optimistic model-based algorithm, dubbed SMRL, for finite-horizon episodic reinforcement learning (RL) when the transition model is specified by exponential family distributions with $d$ parameters and the reward is bounded and known. SMRL uses score matching, an unnormalized density estimation technique that enables efficient estimation of the model parameter by ridge regression. Under standard regularity assumptions, SMRL achieves $\tilde O(d\sqrt{H^3T})$ online regret, where $H$ is the length of each episode and $T$ is the total number of interactions (ignoring polynomial dependence on structural scale parameters).

Related papers

NGD converges to less degenerate solutions than SGD [0.5249805590164902]
The number of free parameters, or dimension, of a model is a straightforward way to measure its complexity. But this is not an accurate measure of complexity: models capable of memorizing their training data often generalize well despite their high dimension. Effective dimension aims to more directly capture the complexity of a model by counting only the number of parameters required to represent the functionality of the model.
arXiv Detail & Related papers (2024-09-07T21:27:49Z)
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis [16.288866201806382]
We develop a model-free RLHF best policy identification algorithm, called $mathsfBSAD$, without explicit reward model inference. The algorithm identifies the optimal policy directly from human preference information in a backward manner.
arXiv Detail & Related papers (2024-06-11T17:01:41Z)
Prior-dependent analysis of posterior sampling reinforcement learning with function approximation [19.505117288012148]
This work advances randomized exploration in reinforcement learning (RL) with function approximation modeled by linear mixture MDPs. We establish the first prior-dependent Bayesian regret bound for RL with function approximation; and refine the Bayesian regret analysis for posterior sampling reinforcement learning (PSRL) We present an upper bound of $mathcalO(dsqrtH3 T log T)$, where $d$ represents the dimensionality of the transition kernel, $H$ the planning horizon, and $T$ the total number of interactions.
arXiv Detail & Related papers (2024-03-17T11:23:51Z)
Computational-Statistical Gaps in Gaussian Single-Index Models [77.1473134227844]
Single-Index Models are high-dimensional regression problems with planted structure. We show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Omega(dkstar/2)$ samples.
arXiv Detail & Related papers (2024-03-08T18:50:19Z)
Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo [104.9535542833054]
We present a scalable and effective exploration strategy based on Thompson sampling for reinforcement learning (RL) We instead directly sample the Q function from its posterior distribution, by using Langevin Monte Carlo. Our approach achieves better or similar results compared with state-of-the-art deep RL algorithms on several challenging exploration tasks from the Atari57 suite.
arXiv Detail & Related papers (2023-05-29T17:11:28Z)
Model-Based Reinforcement Learning with Multinomial Logistic Function Approximation [10.159501412046508]
We study model-based reinforcement learning (RL) for episodic Markov decision processes (MDP) We establish a provably efficient RL algorithm for the MDP whose state transition is given by a multinomial logistic model. To the best of our knowledge, this is the first model-based RL algorithm with multinomial logistic function approximation with provable guarantees.
arXiv Detail & Related papers (2022-12-27T16:25:09Z)
Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision Processes [80.89852729380425]
We propose the first computationally efficient algorithm that achieves the nearly minimax optimal regret $tilde O(dsqrtH3K)$. Our work provides a complete answer to optimal RL with linear MDPs, and the developed algorithm and theoretical tools may be of independent interest.
arXiv Detail & Related papers (2022-12-12T18:58:59Z)
Provably Efficient Offline Reinforcement Learning with Trajectory-Wise Reward [66.81579829897392]
We propose a novel offline reinforcement learning algorithm called Pessimistic vAlue iteRaTion with rEward Decomposition (PARTED) PARTED decomposes the trajectory return into per-step proxy rewards via least-squares-based reward redistribution, and then performs pessimistic value based on the learned proxy reward. To the best of our knowledge, PARTED is the first offline RL algorithm that is provably efficient in general MDP with trajectory-wise reward.
arXiv Detail & Related papers (2022-06-13T19:11:22Z)
Human-in-the-loop: Provably Efficient Preference-based Reinforcement Learning with General Function Approximation [107.54516740713969]
We study human-in-the-loop reinforcement learning (RL) with trajectory preferences. Instead of receiving a numeric reward at each step, the agent only receives preferences over trajectory pairs from a human overseer. We propose the first optimistic model-based algorithm for PbRL with general function approximation.
arXiv Detail & Related papers (2022-05-23T09:03:24Z)
Model-based Reinforcement Learning for Continuous Control with Posterior Sampling [10.91557009257615]
We study model-based posterior sampling for reinforcement learning (PSRL) in continuous state-action spaces. We present MPC-PSRL, a model-based posterior sampling algorithm with model predictive control for action selection.
arXiv Detail & Related papers (2020-11-20T21:00:31Z)
On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces [208.67848059021915]
We study the exploration-exploitation tradeoff at the core of reinforcement learning. In particular, we prove that the complexity of the function class $mathcalF$ characterizes the complexity of the function. Our regret bounds are independent of the number of episodes.
arXiv Detail & Related papers (2020-11-09T18:32:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.