Oracle-Efficient Hybrid Online Learning with Unknown Distribution
- URL: http://arxiv.org/abs/2401.15520v1
- Date: Sat, 27 Jan 2024 22:45:02 GMT
- Title: Oracle-Efficient Hybrid Online Learning with Unknown Distribution
- Authors: Changlong Wu, Jin Sima, Wojciech Szpankowski
- Abstract summary: We show that there exists a computationally efficient online predictor that a regret upper bounded by $tildeO(Tfrac34)$ for a finite-VC class, and upper bounded by $tildeO(Tfracp+1p+2)$ for a class with $alpha$ fat-shattering dimension.
We then extend our result to the scenario of shifting distributions with $K$ changes, yielding a regret of order $tildeO(Tfrac45Kfrac15
- Score: 16.73555330791045
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of oracle-efficient hybrid online learning when the
features are generated by an unknown i.i.d. process and the labels are
generated adversarially. Assuming access to an (offline) ERM oracle, we show
that there exists a computationally efficient online predictor that achieves a
regret upper bounded by $\tilde{O}(T^{\frac{3}{4}})$ for a finite-VC class, and
upper bounded by $\tilde{O}(T^{\frac{p+1}{p+2}})$ for a class with $\alpha$
fat-shattering dimension $\alpha^{-p}$. This provides the first known
oracle-efficient sublinear regret bounds for hybrid online learning with an
unknown feature generation process. In particular, it confirms a conjecture of
Lazaric and Munos (JCSS 2012). We then extend our result to the scenario of
shifting distributions with $K$ changes, yielding a regret of order
$\tilde{O}(T^{\frac{4}{5}}K^{\frac{1}{5}})$. Finally, we establish a regret of
$\tilde{O}((K^{\frac{2}{3}}(\log|\mathcal{H}|)^{\frac{1}{3}}+K)\cdot
T^{\frac{4}{5}})$ for the contextual $K$-armed bandits with a finite policy set
$\mathcal{H}$, i.i.d. generated contexts from an unknown distribution, and
adversarially generated costs.
Related papers
- Online Learning of Halfspaces with Massart Noise [47.71073318490341]
We study the task of online learning in the presence of Massart noise.
We present a computationally efficient algorithm that achieves mistake bound $eta T + o(T)$.
We use our Massart online learner to design an efficient bandit algorithm that obtains expected reward at least $(1-1/k) Delta T - o(T)$ bigger than choosing a random action at every round.
arXiv Detail & Related papers (2024-05-21T17:31:10Z) - Near-Optimal Regret Bounds for Multi-batch Reinforcement Learning [54.806166861456035]
We study the episodic reinforcement learning (RL) problem modeled by finite-horizon Markov Decision Processes (MDPs) with constraint on the number of batches.
We design a computational efficient algorithm to achieve near-optimal regret of $tildeO(sqrtSAH3Kln (1/delta))$tildeO(cdot) hides logarithmic terms of $(S,A,H,K)$ in $K$ episodes.
Our technical contribution are two-fold: 1) a near-optimal design scheme to explore
arXiv Detail & Related papers (2022-10-15T09:22:22Z) - Optimism in Face of a Context: Regret Guarantees for Stochastic
Contextual MDP [46.86114958340962]
We present regret algorithms for contextual MDPs under minimum reachability assumption.
Our approach is the first optimistic approach applied to contextual MDPs with general function approximation.
arXiv Detail & Related papers (2022-07-22T15:00:15Z) - Oracle-Efficient Online Learning for Beyond Worst-Case Adversaries [29.598518028635716]
We study oracle-efficient algorithms for beyond worst-case analysis of online learning.
For the smoothed analysis setting, our results give the first oracle-efficient algorithm for online learning with smoothed adversaries.
arXiv Detail & Related papers (2022-02-17T09:49:40Z) - Learning Stochastic Shortest Path with Linear Function Approximation [74.08819218747341]
We study the shortest path (SSP) problem in reinforcement learning with linear function approximation, where the transition kernel is represented as a linear mixture of unknown models.
We propose a novel algorithm for learning the linear mixture SSP, which can attain a $tilde O(d B_star1.5sqrtK/c_min)$ regret.
arXiv Detail & Related papers (2021-10-25T08:34:00Z) - Model Selection with Near Optimal Rates for Reinforcement Learning with
General Model Classes [27.361399036211694]
We address the problem of model selection for the finite horizon episodic Reinforcement Learning (RL) problem.
In the model selection framework, instead of $mathcalP*$, we are given $M$ nested families of transition kernels.
We show that textttARL-GEN obtains a regret of $TildemathcalO(d_mathcalE*H2+sqrtd_mathcalE* mathbbM* H2 T)$
arXiv Detail & Related papers (2021-07-13T05:00:38Z) - Nearly Horizon-Free Offline Reinforcement Learning [97.36751930393245]
We revisit offline reinforcement learning on episodic time-homogeneous Markov Decision Processes with $S$ states, $A$ actions and planning horizon $H$.
We obtain the first set of nearly $H$-free sample complexity bounds for evaluation and planning using the empirical MDPs.
arXiv Detail & Related papers (2021-03-25T18:52:17Z) - Variance-Aware Confidence Set: Variance-Dependent Bound for Linear
Bandits and Horizon-Free Bound for Linear Mixture MDP [76.94328400919836]
We show how to construct variance-aware confidence sets for linear bandits and linear mixture Decision Process (MDP)
For linear bandits, we obtain an $widetildeO(mathrmpoly(d)sqrt1 + sum_i=1Ksigma_i2) regret bound, where $d is the feature dimension.
For linear mixture MDP, we obtain an $widetildeO(mathrmpoly(d)sqrtK)$ regret bound, where
arXiv Detail & Related papers (2021-01-29T18:57:52Z) - Taking a hint: How to leverage loss predictors in contextual bandits? [63.546913998407405]
We study learning in contextual bandits with the help of loss predictors.
We show that the optimal regret is $mathcalO(minsqrtT, sqrtmathcalETfrac13)$ when $mathcalE$ is known.
arXiv Detail & Related papers (2020-03-04T07:36:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.