Related papers: Nearest Neighbour with Bandit Feedback

Nearest Neighbour with Bandit Feedback

URL: http://arxiv.org/abs/2306.13773v3
Date: Thu, 7 Mar 2024 21:07:35 GMT
Title: Nearest Neighbour with Bandit Feedback
Authors: Stephen Pasteris, Chris Hicks, Vasilios Mavroudis
Abstract summary: Our algorithm handles the fully adversarial setting in which no assumptions at all are made about the data-generation process. We give generic regret for our algorithm and further analyse them when applied to the bandit problem in euclidean space.
Score: 4.9094025705644695
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper we adapt the nearest neighbour rule to the contextual bandit problem. Our algorithm handles the fully adversarial setting in which no assumptions at all are made about the data-generation process. When combined with a sufficiently fast data-structure for (perhaps approximate) adaptive nearest neighbour search, such as a navigating net, our algorithm is extremely efficient - having a per trial running time polylogarithmic in both the number of trials and actions, and taking only quasi-linear space. We give generic regret bounds for our algorithm and further analyse them when applied to the stochastic bandit problem in euclidean space. We note that our algorithm can also be applied to the online classification problem.

Related papers

Causal Bandits without Graph Learning [28.021500949026766]
We develop an efficient algorithm for finding the parent node of the reward node using atomic interventions. We extend our algorithm to the case when the reward node has multiple parents.
arXiv Detail & Related papers (2023-01-26T20:27:14Z)
Machine Learning for Online Algorithm Selection under Censored Feedback [71.6879432974126]
In online algorithm selection (OAS), instances of an algorithmic problem class are presented to an agent one after another, and the agent has to quickly select a presumably best algorithm from a fixed set of candidate algorithms. For decision problems such as satisfiability (SAT), quality typically refers to the algorithm's runtime. In this work, we revisit multi-armed bandit algorithms for OAS and discuss their capability of dealing with the problem. We adapt them towards runtime-oriented losses, allowing for partially censored data while keeping a space- and time-complexity independent of the time horizon.
arXiv Detail & Related papers (2021-09-13T18:10:52Z)
Parallelizing Contextual Linear Bandits [82.65675585004448]
We present a family of (parallel) contextual linear bandit algorithms, whose regret is nearly identical to their perfectly sequential counterparts. We also present an empirical evaluation of these parallel algorithms in several domains, including materials discovery and biological sequence design problems.
arXiv Detail & Related papers (2021-05-21T22:22:02Z)
Leveraging Reinforcement Learning for evaluating Robustness of KNN Search Algorithms [0.0]
The problem of finding K-nearest neighbors in the given dataset for a given query point has been worked upon since several years. In this paper, we survey some novel K-Nearest Neighbor Search approaches that tackles the problem of Search from the perspectives of computations. In order to evaluate the robustness of a KNNS approach against adversarial points, we propose a generic Reinforcement Learning based framework for the same.
arXiv Detail & Related papers (2021-02-10T16:10:58Z)
An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions. We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z)
Corralling Stochastic Bandit Algorithms [54.10645564702416]
We show that the regret of the corralling algorithms is no worse than that of the best algorithm containing the arm with the highest reward. We show that the gap between the highest reward and other rewards depends on the gap between the highest reward and other rewards.
arXiv Detail & Related papers (2020-06-16T15:33:12Z)
Learning to Accelerate Heuristic Searching for Large-Scale Maximum Weighted b-Matching Problems in Online Advertising [51.97494906131859]
Bipartite b-matching is fundamental in algorithm design, and has been widely applied into economic markets, labor markets, etc. Existing exact and approximate algorithms usually fail in such settings due to either requiring intolerable running time or too much computation resource. We propose textttNeuSearcher which leverages the knowledge learned from previously instances to solve new problem instances.
arXiv Detail & Related papers (2020-05-09T02:48:23Z)
Lagrangian Decomposition for Neural Network Verification [148.0448557991349]
A fundamental component of neural network verification is the computation of bounds on the values their outputs can take. We propose a novel approach based on Lagrangian Decomposition. We show that we obtain bounds comparable with off-the-shelf solvers in a fraction of their running time.
arXiv Detail & Related papers (2020-02-24T17:55:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.