Online Learning with Imperfect Hints
- URL: http://arxiv.org/abs/2002.04726v2
- Date: Fri, 2 Oct 2020 17:28:35 GMT
- Title: Online Learning with Imperfect Hints
- Authors: Aditya Bhaskara, Ashok Cutkosky, Ravi Kumar and Manish Purohit
- Abstract summary: We develop algorithms and nearly matching lower bounds for online learning with imperfect directional hints.
Our algorithms are oblivious to the quality of the hints, and the regret bounds interpolate between the always-correlated hints case and the no-hints case.
- Score: 72.4277628722419
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider a variant of the classical online linear optimization problem in
which at every step, the online player receives a "hint" vector before choosing
the action for that round. Rather surprisingly, it was shown that if the hint
vector is guaranteed to have a positive correlation with the cost vector, then
the online player can achieve a regret of $O(\log T)$, thus significantly
improving over the $O(\sqrt{T})$ regret in the general setting. However, the
result and analysis require the correlation property at \emph{all} time steps,
thus raising the natural question: can we design online learning algorithms
that are resilient to bad hints?
In this paper we develop algorithms and nearly matching lower bounds for
online learning with imperfect directional hints. Our algorithms are oblivious
to the quality of the hints, and the regret bounds interpolate between the
always-correlated hints case and the no-hints case. Our results also
generalize, simplify, and improve upon previous results on optimistic regret
bounds, which can be viewed as an additive version of hints.
Related papers
- Non-stationary Projection-free Online Learning with Dynamic and Adaptive
Regret Guarantees [36.746745619968024]
We investigate non-stationary projection-free online learning, and choose dynamic regret and adaptive regret to measure the performance.
Our results are the first general-case dynamic regret bounds for projection-free online learning, and can recover the existing $mathcalO(T3/4)$ static regret.
We propose a projection-free method to attain an $tildemathcalO(tau3/4)$ adaptive regret bound for any interval with length $tau.
arXiv Detail & Related papers (2023-05-19T15:02:10Z) - Efficient and Optimal Algorithms for Contextual Dueling Bandits under
Realizability [59.81339109121384]
We study the $K$ contextual dueling bandit problem, a sequential decision making setting in which the learner uses contextual information to make two decisions, but only observes emphpreference-based feedback suggesting that one decision was better than the other.
We provide a new algorithm that achieves the optimal regret rate for a new notion of best response regret, which is a strictly stronger performance measure than those considered in prior works.
arXiv Detail & Related papers (2021-11-24T07:14:57Z) - Fast Rates for Nonparametric Online Learning: From Realizability to
Learning in Games [36.969021834291745]
We propose a proper learning algorithm which gets a near-optimal mistake bound in terms of the sequential fat-shattering dimension of the hypothesis class.
This result answers a question as to whether proper learners could achieve near-optimal mistake bounds.
For the real-valued (regression) setting, the optimal mistake bound was not even known for improper learners.
arXiv Detail & Related papers (2021-11-17T05:24:21Z) - Logarithmic Regret from Sublinear Hints [76.87432703516942]
We show that an algorithm can obtain $O(log T)$ regret with just $O(sqrtT)$ hints under a natural query model.
We also show that $o(sqrtT)$ hints cannot guarantee better than $Omega(sqrtT)$ regret.
arXiv Detail & Related papers (2021-11-09T16:50:18Z) - Revisiting Smoothed Online Learning [70.09792747315323]
We investigate the problem of smoothed online learning, in which the online learner suffers both a hitting cost and a switching cost.
To bound the competitive ratio, we assume the hitting cost is known to the learner in each round, and investigate the greedy algorithm which simply minimizes the weighted sum of the hitting cost and the switching cost.
arXiv Detail & Related papers (2021-02-13T14:15:55Z) - Online Markov Decision Processes with Aggregate Bandit Feedback [74.85532145498742]
We study a novel variant of online finite-horizon Markov Decision Processes with adversarially changing loss functions and initially unknown dynamics.
In each episode, the learner suffers the loss accumulated along the trajectory realized by the policy chosen for the episode, and observes aggregate bandit feedback.
Our main result is a computationally efficient algorithm with $O(sqrtK)$ regret for this setting, where $K$ is the number of episodes.
arXiv Detail & Related papers (2021-01-31T16:49:07Z) - Optimistic and Adaptive Lagrangian Hedging [11.698607733213226]
In online learning an algorithm plays against an environment with losses possibly picked by an adversary at each round.
We introduce optimism and adaptive stepsizes to Lagrangian hedging, a class of online algorithms that includes regret-matching, and hedge.
arXiv Detail & Related papers (2021-01-23T23:32:40Z) - Online Linear Optimization with Many Hints [72.4277628722419]
We study an online linear optimization (OLO) problem in which the learner is provided access to $K$ "hint" vectors in each round prior to making a decision.
In this setting, we devise an algorithm that obtains logarithmic regret whenever there exists a convex combination of the $K$ hints that has positive correlation with the cost vectors.
arXiv Detail & Related papers (2020-10-06T23:25:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.