Sequential Probability Assignment with Contexts: Minimax Regret, Contextual Shtarkov Sums, and Contextual Normalized Maximum Likelihood
- URL: http://arxiv.org/abs/2410.03849v1
- Date: Fri, 4 Oct 2024 18:31:12 GMT
- Title: Sequential Probability Assignment with Contexts: Minimax Regret, Contextual Shtarkov Sums, and Contextual Normalized Maximum Likelihood
- Authors: Ziyi Liu, Idan Attias, Daniel M. Roy,
- Abstract summary: We introduce a novel complexity measure, the emphcontextual Shtarkov sum, corresponding to the Shtarkov sum after projection onto a multiary context tree.
We derive the minimax optimal strategy, dubbed emph Normalized Maximum Likelihood (cNML)
Our results hold for sequential experts, beyond binary labels, which are settings rarely considered in prior work.
- Score: 28.94461817548213
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the fundamental problem of sequential probability assignment, also known as online learning with logarithmic loss, with respect to an arbitrary, possibly nonparametric hypothesis class. Our goal is to obtain a complexity measure for the hypothesis class that characterizes the minimax regret and to determine a general, minimax optimal algorithm. Notably, the sequential $\ell_{\infty}$ entropy, extensively studied in the literature (Rakhlin and Sridharan, 2015, Bilodeau et al., 2020, Wu et al., 2023), was shown to not characterize minimax risk in general. Inspired by the seminal work of Shtarkov (1987) and Rakhlin, Sridharan, and Tewari (2010), we introduce a novel complexity measure, the \emph{contextual Shtarkov sum}, corresponding to the Shtarkov sum after projection onto a multiary context tree, and show that the worst case log contextual Shtarkov sum equals the minimax regret. Using the contextual Shtarkov sum, we derive the minimax optimal strategy, dubbed \emph{contextual Normalized Maximum Likelihood} (cNML). Our results hold for sequential experts, beyond binary labels, which are settings rarely considered in prior work. To illustrate the utility of this characterization, we provide a short proof of a new regret upper bound in terms of sequential $\ell_{\infty}$ entropy, unifying and sharpening state-of-the-art bounds by Bilodeau et al. (2020) and Wu et al. (2023).
Related papers
- Nearly Optimal Sample Complexity of Offline KL-Regularized Contextual Bandits under Single-Policy Concentrability [49.96531901205305]
We propose the emphfirst algorithm with $tildeO(epsilon-1)$ sample complexity under single-policy concentrability for offline contextual bandits.
Our proof leverages the strong convexity of the KL regularization, and the conditional non-negativity of the gap between the true reward and its pessimistic estimator.
We extend our algorithm to contextual dueling bandits and achieve a similar nearly optimal sample complexity.
arXiv Detail & Related papers (2025-02-09T22:14:45Z) - Concurrent Learning with Aggregated States via Randomized Least Squares Value Iteration [40.73142019282658]
We study whether injecting randomization can help a society of agents it concurently explore an environment.
We demonstrate worst-case regret bounds in both finite- and infinite-horizon environments.
We reduce the space complexity by a factor of $K$ while incurring only a $sqrtK$ increase in the worst-case regret bound.
arXiv Detail & Related papers (2025-01-23T05:37:33Z) - High Probability Bound for Cross-Learning Contextual Bandits with Unknown Context Distributions [17.43748116766233]
We study the problem of contextual bandits with cross learning, where the learner observes the loss associated with the action across all possible contexts.
Our focus is on a setting where losses are chosen adversarially, and contexts are sampled i.i.d from a specific distribution.
Schneider and Zimmert (2023) recently proposed an algorithm that achieves nearly optimal expected regret.
We present a novel, in-depth analysis of their algorithm and demonstrate that it actually achieves near-optimal regret with high probability.
arXiv Detail & Related papers (2024-10-05T08:19:54Z) - Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic
Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel.
An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost.
Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z) - Optimal Multi-Distribution Learning [88.3008613028333]
Multi-distribution learning seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions.
We propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon2.
arXiv Detail & Related papers (2023-12-08T16:06:29Z) - Settling the Sample Complexity of Online Reinforcement Learning [92.02082223856479]
We show how to achieve minimax-optimal regret without incurring any burn-in cost.
We extend our theory to unveil the influences of problem-dependent quantities like the optimal value/cost and certain variances.
arXiv Detail & Related papers (2023-07-25T15:42:11Z) - Optimal Algorithms for Stochastic Complementary Composite Minimization [55.26935605535377]
Inspired by regularization techniques in statistics and machine learning, we study complementary composite minimization.
We provide novel excess risk bounds, both in expectation and with high probability.
Our algorithms are nearly optimal, which we prove via novel lower complexity bounds for this class of problems.
arXiv Detail & Related papers (2022-11-03T12:40:24Z) - Expected Worst Case Regret via Stochastic Sequential Covering [14.834625066344582]
We introduce a notion of expected worst case minimax regret that generalizes and encompasses prior known minimax regrets.
For such minimax regrets we establish tight bounds via a novel concept of upper global sequential covering.
We demonstrate the effectiveness of our approach by establishing tight bounds on the expected worst case minimax regrets for logarithmic loss and general mixable losses.
arXiv Detail & Related papers (2022-09-09T17:31:46Z) - Precise Regret Bounds for Log-loss via a Truncated Bayesian Algorithm [14.834625066344582]
We study the sequential general online regression, known also as the sequential probability assignments, under logarithmic loss.
We focus on obtaining tight, often matching, lower and upper bounds for the sequential minimax regret that are defined as the excess loss it incurs over a class of experts.
arXiv Detail & Related papers (2022-05-07T22:03:00Z) - Navigating to the Best Policy in Markov Decision Processes [68.8204255655161]
We investigate the active pure exploration problem in Markov Decision Processes.
Agent sequentially selects actions and, from the resulting system trajectory, aims at the best as fast as possible.
arXiv Detail & Related papers (2021-06-05T09:16:28Z) - Tight Bounds on Minimax Regret under Logarithmic Loss via
Self-Concordance [37.0414602993676]
We show that for any expert class with (sequential) metric entropy $mathcalO(gamma-p)$ at scale $gamma$, the minimax regret is $mathcalO(np/(p+1))$.
As an application of our techniques, we resolve the minimax regret for nonparametric Lipschitz classes of experts.
arXiv Detail & Related papers (2020-07-02T14:47:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.