Nearest Neighbour with Bandit Feedback
- URL: http://arxiv.org/abs/2306.13773v3
- Date: Thu, 7 Mar 2024 21:07:35 GMT
- Title: Nearest Neighbour with Bandit Feedback
- Authors: Stephen Pasteris, Chris Hicks, Vasilios Mavroudis
- Abstract summary: Our algorithm handles the fully adversarial setting in which no assumptions at all are made about the data-generation process.
We give generic regret for our algorithm and further analyse them when applied to the bandit problem in euclidean space.
- Score: 4.9094025705644695
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper we adapt the nearest neighbour rule to the contextual bandit
problem. Our algorithm handles the fully adversarial setting in which no
assumptions at all are made about the data-generation process. When combined
with a sufficiently fast data-structure for (perhaps approximate) adaptive
nearest neighbour search, such as a navigating net, our algorithm is extremely
efficient - having a per trial running time polylogarithmic in both the number
of trials and actions, and taking only quasi-linear space. We give generic
regret bounds for our algorithm and further analyse them when applied to the
stochastic bandit problem in euclidean space. We note that our algorithm can
also be applied to the online classification problem.
Related papers
- Causal Bandits without Graph Learning [28.021500949026766]
We develop an efficient algorithm for finding the parent node of the reward node using atomic interventions.
We extend our algorithm to the case when the reward node has multiple parents.
arXiv Detail & Related papers (2023-01-26T20:27:14Z) - Machine Learning for Online Algorithm Selection under Censored Feedback [71.6879432974126]
In online algorithm selection (OAS), instances of an algorithmic problem class are presented to an agent one after another, and the agent has to quickly select a presumably best algorithm from a fixed set of candidate algorithms.
For decision problems such as satisfiability (SAT), quality typically refers to the algorithm's runtime.
In this work, we revisit multi-armed bandit algorithms for OAS and discuss their capability of dealing with the problem.
We adapt them towards runtime-oriented losses, allowing for partially censored data while keeping a space- and time-complexity independent of the time horizon.
arXiv Detail & Related papers (2021-09-13T18:10:52Z) - Parallelizing Contextual Linear Bandits [82.65675585004448]
We present a family of (parallel) contextual linear bandit algorithms, whose regret is nearly identical to their perfectly sequential counterparts.
We also present an empirical evaluation of these parallel algorithms in several domains, including materials discovery and biological sequence design problems.
arXiv Detail & Related papers (2021-05-21T22:22:02Z) - Leveraging Reinforcement Learning for evaluating Robustness of KNN
Search Algorithms [0.0]
The problem of finding K-nearest neighbors in the given dataset for a given query point has been worked upon since several years.
In this paper, we survey some novel K-Nearest Neighbor Search approaches that tackles the problem of Search from the perspectives of computations.
In order to evaluate the robustness of a KNNS approach against adversarial points, we propose a generic Reinforcement Learning based framework for the same.
arXiv Detail & Related papers (2021-02-10T16:10:58Z) - An Asymptotically Optimal Primal-Dual Incremental Algorithm for
Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions.
We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z) - Corralling Stochastic Bandit Algorithms [54.10645564702416]
We show that the regret of the corralling algorithms is no worse than that of the best algorithm containing the arm with the highest reward.
We show that the gap between the highest reward and other rewards depends on the gap between the highest reward and other rewards.
arXiv Detail & Related papers (2020-06-16T15:33:12Z) - Learning to Accelerate Heuristic Searching for Large-Scale Maximum
Weighted b-Matching Problems in Online Advertising [51.97494906131859]
Bipartite b-matching is fundamental in algorithm design, and has been widely applied into economic markets, labor markets, etc.
Existing exact and approximate algorithms usually fail in such settings due to either requiring intolerable running time or too much computation resource.
We propose textttNeuSearcher which leverages the knowledge learned from previously instances to solve new problem instances.
arXiv Detail & Related papers (2020-05-09T02:48:23Z) - Lagrangian Decomposition for Neural Network Verification [148.0448557991349]
A fundamental component of neural network verification is the computation of bounds on the values their outputs can take.
We propose a novel approach based on Lagrangian Decomposition.
We show that we obtain bounds comparable with off-the-shelf solvers in a fraction of their running time.
arXiv Detail & Related papers (2020-02-24T17:55:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.