Related papers: Beyond Bandit Feedback in Online Multiclass Classification

Beyond Bandit Feedback in Online Multiclass Classification

URL: http://arxiv.org/abs/2106.03596v1
Date: Mon, 7 Jun 2021 13:22:30 GMT
Title: Beyond Bandit Feedback in Online Multiclass Classification
Authors: Dirk van der Hoeven and Federico Fusco and Nicol\`o Cesa-Bianchi
Abstract summary: We study the problem of online multiclass classification in a setting where the learner's feedback is determined by an arbitrary directed graph. We introduce Gappletron, the first online multiclass algorithm that works with arbitrary feedback graphs.
Score: 17.07011090727996
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the problem of online multiclass classification in a setting where the learner's feedback is determined by an arbitrary directed graph. While including bandit feedback as a special case, feedback graphs allow a much richer set of applications, including filtering and label efficient classification. We introduce Gappletron, the first online multiclass algorithm that works with arbitrary feedback graphs. For this new algorithm, we prove surrogate regret bounds that hold, both in expectation and with high probability, for a large class of surrogate losses. Our bounds are of order $B\sqrt{\rho KT}$, where $B$ is the diameter of the prediction space, $K$ is the number of classes, $T$ is the time horizon, and $\rho$ is the domination number (a graph-theoretic parameter affecting the amount of exploration). In the full information case, we show that Gappletron achieves a constant surrogate regret of order $B^2K$. We also prove a general lower bound of order $\max\big\{B^2K,\sqrt{T}\big\}$ showing that our upper bounds are not significantly improvable. Experiments on synthetic data show that for various feedback graphs, our algorithm is competitive against known baselines.

Related papers

The Real Price of Bandit Information in Multiclass Classification [73.17969992976501]
We revisit the classical problem of multiclass classification with bandit feedback. We present a new bandit classification algorithm that guarantees regret $smashwidetildeO(|H|+sqrtT)$.
arXiv Detail & Related papers (2024-05-16T12:11:09Z)
Online Learning with Feedback Graphs: The True Shape of Regret [82.00098840619847]
We prove that the minimax regret is proportional to $R*$ for any graph and time horizon $T$. Introducing an intricate exploration strategy, we define the mainAlgorithm algorithm that achieves the minimax optimal regret bound.
arXiv Detail & Related papers (2023-06-05T15:35:00Z)
Horizon-free Reinforcement Learning in Adversarial Linear Mixture MDPs [72.40181882916089]
We show that our algorithm achieves an $tildeObig((d+log (|mathcalS|2 |mathcalA|))sqrtKbig)$ regret with full-information feedback, where $d$ is the dimension of a known feature mapping linearly parametrizing the unknown transition kernel of the MDP, $K$ is the number of episodes, $|mathcalS|$ and $|mathcalA|$ are the cardinalities of the state and action spaces
arXiv Detail & Related papers (2023-05-15T05:37:32Z)
Improved High-Probability Regret for Adversarial Bandits with Time-Varying Feedback Graphs [62.52390282012508]
We study high-probability regret bounds for adversarial $K$-armed bandits with time-varying feedback graphs over $T$ rounds. We develop an algorithm that achieves the optimal regret $widetildemathcalO((sum_t=1Talpha_t)1/2+max_tin[T]alpha_t]$ with high probability. We also develop the first algorithm that achieves the optimal high-probability regret bound for weakly observable graphs.
arXiv Detail & Related papers (2022-10-04T04:36:15Z)
Multi-armed Bandit Learning on a Graph [0.0]
We study an extension of MAB called the graph bandit, where an agent travels over a graph to maximize the reward collected from different nodes. We design a learning algorithm, G-UCB, that balances long-term exploration-exploitation using the principle of optimism. Our proposed algorithm achieves $O(sqrt|S|Tlog(T)+D|S|log T)$ learning regret, where $|S|$ is the number of nodes and $D$ is the diameter of the graph.
arXiv Detail & Related papers (2022-09-20T02:31:42Z)
A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with Feedback Graphs [21.563733343861713]
We consider online learning with feedback graphs, a sequential decision-making framework where the learner's feedback is determined by a directed graph over the action set. We present a computationally efficient algorithm for learning in this framework that simultaneously achieves near-optimal regret bounds in both and adversarial environments.
arXiv Detail & Related papers (2022-06-01T15:14:32Z)
Littlestone Classes are Privately Online Learnable [28.04975353867202]
We consider the problem of online classification under a privacy constraint. In this setting a learner observes sequentially a stream of labelled examples $(x_t, y_t)$, for $1 leq t leq T$, and returns at each iteration a hypothesis $h_t$ which is used to predict the label of each new example $x_t$. The learner's performance is measured by her regret against a known hypothesis class $mathcalH$.
arXiv Detail & Related papers (2021-06-25T09:08:33Z)
Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits with Linear Payoff Functions [53.77572276969548]
We show that the C$2$UCB algorithm has the optimal regret bound $tildeO(dsqrtkT + dk)$ for the partition matroid constraints. For general constraints, we propose an algorithm that modifies the reward estimates of arms in the C$2$UCB algorithm.
arXiv Detail & Related papers (2021-01-20T04:29:18Z)
Adversarial Linear Contextual Bandits with Graph-Structured Side Observations [80.95090605985042]
A learning agent repeatedly chooses from a set of $K$ actions after being presented with a $d$-dimensional context vector. The agent incurs and observes the loss of the chosen action, but also observes the losses of its neighboring actions in the observation structures. Two efficient algorithms are developed based on textttEXP3.
arXiv Detail & Related papers (2020-12-10T15:40:07Z)
Exploiting the Surrogate Gap in Online Multiclass Classification [13.452510519858995]
Gaptron is a randomized first-order algorithm for online multiclass classification. We show expected mistake bounds with respect to the logistic loss, hinge loss, and the smooth hinge loss with constant regret. We present a new proof technique that exploits the gap between the zero-one loss and surrogate losses.
arXiv Detail & Related papers (2020-07-24T16:19:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.