Information-Theoretic Minimax Regret Bounds for Reinforcement Learning based on Duality
- URL: http://arxiv.org/abs/2410.16013v1
- Date: Mon, 21 Oct 2024 13:45:02 GMT
- Title: Information-Theoretic Minimax Regret Bounds for Reinforcement Learning based on Duality
- Authors: Raghav Bongole, Amaury Gouverneur, Borja Rodríguez-Gálvez, Tobias J. Oechtering, Mikael Skoglund,
- Abstract summary: We study agents acting in an unknown environment where the agent's goal is to find a robust policy.
We consider agents minimizing the maximum regret over different environment parameters, leading to the study of minimax regret.
This research focuses on deriving information-theoretic bounds for minimax regret in Markov Decision Processes.
- Score: 33.907054045921306
- License:
- Abstract: We study agents acting in an unknown environment where the agent's goal is to find a robust policy. We consider robust policies as policies that achieve high cumulative rewards for all possible environments. To this end, we consider agents minimizing the maximum regret over different environment parameters, leading to the study of minimax regret. This research focuses on deriving information-theoretic bounds for minimax regret in Markov Decision Processes (MDPs) with a finite time horizon. Building on concepts from supervised learning, such as minimum excess risk (MER) and minimax excess risk, we use recent bounds on the Bayesian regret to derive minimax regret bounds. Specifically, we establish minimax theorems and use bounds on the Bayesian regret to perform minimax regret analysis using these minimax theorems. Our contributions include defining a suitable minimax regret in the context of MDPs, finding information-theoretic bounds for it, and applying these bounds in various scenarios.
Related papers
- Best Arm Identification with Minimal Regret [55.831935724659175]
Best arm identification problem elegantly amalgamates regret minimization and BAI.
Agent's goal is to identify the best arm with a prescribed confidence level.
Double KL-UCB algorithm achieves optimality as the confidence level tends to zero.
arXiv Detail & Related papers (2024-09-27T16:46:02Z) - Minimax Linear Regression under the Quantile Risk [31.277788690403522]
We study the problem of designing minimax procedures in linear regression under the quantile risk.
We prove a matching upper bound on the worst-case quantile risk of a variant of the recently proposed min-max regression procedure.
arXiv Detail & Related papers (2024-06-17T23:24:14Z) - Refining Minimax Regret for Unsupervised Environment Design [15.281908507614512]
We introduce level-perfect MMR, a refinement of the minimax regret objective.
We show that BLP policies act consistently with a Perfect Bayesian policy over all levels.
We also introduce an algorithm, ReMiDi, that results in a BLP policy at convergence.
arXiv Detail & Related papers (2024-02-19T16:51:29Z) - Beyond Value-Function Gaps: Improved Instance-Dependent Regret Bounds
for Episodic Reinforcement Learning [50.44564503645015]
We provide improved gap-dependent regret bounds for reinforcement learning in finite episodic Markov decision processes.
We prove tighter upper regret bounds for optimistic algorithms and accompany them with new information-theoretic lower bounds for a large class of MDPs.
arXiv Detail & Related papers (2021-07-02T20:36:05Z) - A Fully Problem-Dependent Regret Lower Bound for Finite-Horizon MDPs [117.82903457289584]
We derive a novel problem-dependent lower-bound for regret in finite-horizon Markov Decision Processes (MDPs)
We show that our lower-bound is considerably smaller than in the general case and it does not scale with the minimum action gap at all.
We show that this last result is attainable (up to $poly(H)$ terms, where $H$ is the horizon) by providing a regret upper-bound based on policy gaps for an optimistic algorithm.
arXiv Detail & Related papers (2021-06-24T13:46:09Z) - Rate-Distortion Analysis of Minimum Excess Risk in Bayesian Learning [15.544041797200045]
Minimum Excess Risk (MER) in Bayesian learning is defined as the difference between the minimum expected loss achievable when learning from data and the minimum expected loss that could be achieved if the underlying parameter $W$ was observed.
We derive information-theoretic bounds on the difference between these upper and lower bounds and show that they can provide order-wise tight rates for MER.
arXiv Detail & Related papers (2021-05-10T08:14:10Z) - Minimax Off-Policy Evaluation for Multi-Armed Bandits [58.7013651350436]
We study the problem of off-policy evaluation in the multi-armed bandit model with bounded rewards.
We develop minimax rate-optimal procedures under three settings.
arXiv Detail & Related papers (2021-01-19T18:55:29Z) - Off-Policy Evaluation of Slate Policies under Bayes Risk [70.10677881866047]
We study the problem of off-policy evaluation for slate bandits, for the typical case in which the logging policy factorizes over the slots of the slate.
We show that the risk improvement over PI grows linearly with the number of slots, and linearly with the gap between the arithmetic and the harmonic mean of a set of slot-level divergences.
arXiv Detail & Related papers (2021-01-05T20:07:56Z) - On Uninformative Optimal Policies in Adaptive LQR with Unknown B-Matrix [7.507288369705302]
Local minimax regret lower bounds for adaptive Linear Quadratic Regulators (LQR)
We consider affinely parametrized $B$-matrices and known $A$-matrices.
It is shown that if the parametrization induces an uninformative optimal policy, logarithmic regret is impossible.
arXiv Detail & Related papers (2020-11-18T13:50:31Z) - Constrained regret minimization for multi-criterion multi-armed bandits [5.349852254138086]
We study the problem of regret minimization over a given time horizon, subject to a risk constraint.
We propose a Risk-Constrained Lower Confidence Bound algorithm that guarantees logarithmic regret.
We prove lower bounds on the performance of any risk-constrained regret minimization algorithm.
arXiv Detail & Related papers (2020-06-17T04:23:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.