Related papers: Sequential Probability Assignment with Contexts: Minimax Regret, Contextual Shtarkov Sums, and Contextual Normalized Maximum Likelihood

Sequential Probability Assignment with Contexts: Minimax Regret, Contextual Shtarkov Sums, and Contextual Normalized Maximum Likelihood

URL: http://arxiv.org/abs/2410.03849v1
Date: Fri, 4 Oct 2024 18:31:12 GMT
Title: Sequential Probability Assignment with Contexts: Minimax Regret, Contextual Shtarkov Sums, and Contextual Normalized Maximum Likelihood
Authors: Ziyi Liu, Idan Attias, Daniel M. Roy,
Abstract summary: We introduce a novel complexity measure, the emphcontextual Shtarkov sum, corresponding to the Shtarkov sum after projection onto a multiary context tree. We derive the minimax optimal strategy, dubbed emph Normalized Maximum Likelihood (cNML) Our results hold for sequential experts, beyond binary labels, which are settings rarely considered in prior work.
Score: 28.94461817548213
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the fundamental problem of sequential probability assignment, also known as online learning with logarithmic loss, with respect to an arbitrary, possibly nonparametric hypothesis class. Our goal is to obtain a complexity measure for the hypothesis class that characterizes the minimax regret and to determine a general, minimax optimal algorithm. Notably, the sequential $\ell_{\infty}$ entropy, extensively studied in the literature (Rakhlin and Sridharan, 2015, Bilodeau et al., 2020, Wu et al., 2023), was shown to not characterize minimax risk in general. Inspired by the seminal work of Shtarkov (1987) and Rakhlin, Sridharan, and Tewari (2010), we introduce a novel complexity measure, the \emph{contextual Shtarkov sum}, corresponding to the Shtarkov sum after projection onto a multiary context tree, and show that the worst case log contextual Shtarkov sum equals the minimax regret. Using the contextual Shtarkov sum, we derive the minimax optimal strategy, dubbed \emph{contextual Normalized Maximum Likelihood} (cNML). Our results hold for sequential experts, beyond binary labels, which are settings rarely considered in prior work. To illustrate the utility of this characterization, we provide a short proof of a new regret upper bound in terms of sequential $\ell_{\infty}$ entropy, unifying and sharpening state-of-the-art bounds by Bilodeau et al. (2020) and Wu et al. (2023).

Related papers

On the Minimax Regret of Sequential Probability Assignment via Square-Root Entropy [70.10668953625247]
We show that the minimax regret for the case of no side information can be upper bounded in terms of sequential square-root entropy. For the problem of sequential probability assignment with side information, we develop both upper and lower bounds based on the aforementioned entropy.
arXiv Detail & Related papers (2025-03-22T17:26:34Z)
Nearly Optimal Sample Complexity of Offline KL-Regularized Contextual Bandits under Single-Policy Concentrability [49.96531901205305]
We propose the emphfirst algorithm with $tildeO(epsilon-1)$ sample complexity under single-policy concentrability for offline contextual bandits. Our proof leverages the strong convexity of the KL regularization, and the conditional non-negativity of the gap between the true reward and its pessimistic estimator. We extend our algorithm to contextual dueling bandits and achieve a similar nearly optimal sample complexity.
arXiv Detail & Related papers (2025-02-09T22:14:45Z)
High Probability Bound for Cross-Learning Contextual Bandits with Unknown Context Distributions [17.43748116766233]
We study the problem of contextual bandits with cross learning, where the learner observes the loss associated with the action across all possible contexts. Our focus is on a setting where losses are chosen adversarially, and contexts are sampled i.i.d from a specific distribution. Schneider and Zimmert (2023) recently proposed an algorithm that achieves nearly optimal expected regret. We present a novel, in-depth analysis of their algorithm and demonstrate that it actually achieves near-optimal regret with high probability.
arXiv Detail & Related papers (2024-10-05T08:19:54Z)
A conversion theorem and minimax optimality for continuum contextual bandits [70.71582850199871]
We study the contextual continuum bandits problem, where the learner sequentially receives a side information vector and has to choose an action in a convex set. The goal is to minimize all the underlying functions for the received contexts, leading to the contextual notion of regret. We show that any algorithm achieving a sub-linear static regret can be extended to achieve a sub-linear contextual regret.
arXiv Detail & Related papers (2024-06-09T10:12:08Z)
Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel. An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost. Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z)
Optimal Multi-Distribution Learning [88.3008613028333]
Multi-distribution learning seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions. We propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon2.
arXiv Detail & Related papers (2023-12-08T16:06:29Z)
Settling the Sample Complexity of Online Reinforcement Learning [92.02082223856479]
We show how to achieve minimax-optimal regret without incurring any burn-in cost. We extend our theory to unveil the influences of problem-dependent quantities like the optimal value/cost and certain variances.
arXiv Detail & Related papers (2023-07-25T15:42:11Z)
Optimal Algorithms for Stochastic Complementary Composite Minimization [55.26935605535377]
Inspired by regularization techniques in statistics and machine learning, we study complementary composite minimization. We provide novel excess risk bounds, both in expectation and with high probability. Our algorithms are nearly optimal, which we prove via novel lower complexity bounds for this class of problems.
arXiv Detail & Related papers (2022-11-03T12:40:24Z)
Expected Worst Case Regret via Stochastic Sequential Covering [14.834625066344582]
We introduce a notion of expected worst case minimax regret that generalizes and encompasses prior known minimax regrets. For such minimax regrets we establish tight bounds via a novel concept of upper global sequential covering. We demonstrate the effectiveness of our approach by establishing tight bounds on the expected worst case minimax regrets for logarithmic loss and general mixable losses.
arXiv Detail & Related papers (2022-09-09T17:31:46Z)
Precise Regret Bounds for Log-loss via a Truncated Bayesian Algorithm [14.834625066344582]
We study the sequential general online regression, known also as the sequential probability assignments, under logarithmic loss. We focus on obtaining tight, often matching, lower and upper bounds for the sequential minimax regret that are defined as the excess loss it incurs over a class of experts.
arXiv Detail & Related papers (2022-05-07T22:03:00Z)
Navigating to the Best Policy in Markov Decision Processes [68.8204255655161]
We investigate the active pure exploration problem in Markov Decision Processes. Agent sequentially selects actions and, from the resulting system trajectory, aims at the best as fast as possible.
arXiv Detail & Related papers (2021-06-05T09:16:28Z)
Nearly Dimension-Independent Sparse Linear Bandit over Small Action Spaces via Best Subset Selection [71.9765117768556]
We consider the contextual bandit problem under the high dimensional linear model. This setting finds essential applications such as personalized recommendation, online advertisement, and personalized medicine. We propose doubly growing epochs and estimating the parameter using the best subset selection method.
arXiv Detail & Related papers (2020-09-04T04:10:39Z)
Tight Bounds on Minimax Regret under Logarithmic Loss via Self-Concordance [37.0414602993676]
We show that for any expert class with (sequential) metric entropy $mathcalO(gamma-p)$ at scale $gamma$, the minimax regret is $mathcalO(np/(p+1))$. As an application of our techniques, we resolve the minimax regret for nonparametric Lipschitz classes of experts.
arXiv Detail & Related papers (2020-07-02T14:47:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.