Batched Nonparametric Bandits via k-Nearest Neighbor UCB
- URL: http://arxiv.org/abs/2505.10498v2
- Date: Fri, 01 Aug 2025 17:02:02 GMT
- Title: Batched Nonparametric Bandits via k-Nearest Neighbor UCB
- Authors: Sakshi Arya,
- Abstract summary: We study sequential decision-making in batched nonparametric contextual bandits.<n>We propose a nonparametric algorithm that combines adaptive k-nearest neighbor (k-NN) regression with the upper confidence bound (UCB) principle.<n>Our method, BaNk-UCB, is fully nonparametric, adapts to the context dimension, and is simple to implement.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study sequential decision-making in batched nonparametric contextual bandits, where actions are selected over a finite horizon divided into a small number of batches. Motivated by constraints in domains such as medicine and marketing -- where online feedback is limited -- we propose a nonparametric algorithm that combines adaptive k-nearest neighbor (k-NN) regression with the upper confidence bound (UCB) principle. Our method, BaNk-UCB, is fully nonparametric, adapts to the context dimension, and is simple to implement. Unlike prior work relying on parametric or binning-based estimators, BaNk-UCB uses local geometry to estimate rewards and adaptively balances exploration and exploitation. We provide near-optimal regret guarantees under standard Lipschitz smoothness and margin assumptions, using a theoretically motivated batch schedule that balances regret across batches and achieves minimax-optimal rates. Empirical evaluations on synthetic and real-world datasets demonstrate that BaNk-UCB consistently outperforms binning-based baselines.
Related papers
- BAPE: Learning an Explicit Bayes Classifier for Long-tailed Visual Recognition [78.70453964041718]
Current deep learning algorithms usually solve for the optimal classifier by emphimplicitly estimating the posterior probabilities.<n>This simple methodology has been proven effective for meticulously balanced academic benchmark datasets.<n>However, it is not applicable to the long-tailed data distributions in the real world.<n>This paper presents a novel approach (BAPE) that provides a more precise theoretical estimation of the data distributions.
arXiv Detail & Related papers (2025-06-29T15:12:50Z) - Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning [55.33984461046492]
Policy-based methods currently dominate reinforcement learning pipelines for large language model (LLM) reasoning.<n>We introduce Trajectory Bellman Residual Minimization (TBRM), an algorithm that naturally adapts this idea to LLMs.<n>We prove convergence to the near-optimal KL-regularized policy from arbitrary off-policy via an improved change-of-trajectory-measure analysis.
arXiv Detail & Related papers (2025-05-21T09:41:53Z) - Likelihood-Free Adaptive Bayesian Inference via Nonparametric Distribution Matching [2.0319002824093015]
We propose Adaptive Bayesian Inference (ABI), a framework that bypasses traditional data-space discrepancies.<n>ABI transforms the problem of measuring divergence between posterior distributions into a tractable sequence of conditional quantile regression tasks.<n>We demonstrate that ABI significantly outperforms data-based Wasserstein, summary-based ABC, and state-of-the-art likelihood-free simulators.
arXiv Detail & Related papers (2025-05-07T17:50:14Z) - On Pareto Optimality for the Multinomial Logistic Bandit [0.0]
We provide a new online learning algorithm for tackling the Multinomial Logit Bandit problem.<n>Despite the challenges posed by the MNL model, we develop a novel Upper Confidence Bound (UCB)-based method.<n>We develop theoretical guarantees characterizing the tradeoff between regret and estimation error for the MNL-Bandit problem.
arXiv Detail & Related papers (2025-01-31T16:42:29Z) - Data-Driven Upper Confidence Bounds with Near-Optimal Regret for Heavy-Tailed Bandits [0.0]
We propose a new distribution-free, data-driven UCB algorithm for symmetric reward distributions.
We prove a near-optimal regret bound for the proposed anytime, parameter-free RMM-UCB method, even for heavy-tailed distributions.
arXiv Detail & Related papers (2024-06-09T10:06:50Z) - Optimal Baseline Corrections for Off-Policy Contextual Bandits [61.740094604552475]
We aim to learn decision policies that optimize an unbiased offline estimate of an online reward metric.
We propose a single framework built on their equivalence in learning scenarios.
Our framework enables us to characterize the variance-optimal unbiased estimator and provide a closed-form solution for it.
arXiv Detail & Related papers (2024-05-09T12:52:22Z) - Batched Nonparametric Contextual Bandits [21.031965676746776]
We study nonparametric contextual bandits under batch constraints.
We propose a novel batch learning algorithm that achieves the optimal regret.
Our theoretical results suggest that for nonparametric contextual bandits, a nearly constant number of policy updates can attain optimal regret.
arXiv Detail & Related papers (2024-02-27T18:06:20Z) - Towards Continual Learning Desiderata via HSIC-Bottleneck
Orthogonalization and Equiangular Embedding [55.107555305760954]
We propose a conceptually simple yet effective method that attributes forgetting to layer-wise parameter overwriting and the resulting decision boundary distortion.
Our method achieves competitive accuracy performance, even with absolute superiority of zero exemplar buffer and 1.02x the base model.
arXiv Detail & Related papers (2024-01-17T09:01:29Z) - BOF-UCB: A Bayesian-Optimistic Frequentist Algorithm for Non-Stationary
Contextual Bandits [16.59103967569845]
We propose a novel Bayesian-Optimistic Frequentist Upper Confidence Bound (BOF-UCB) algorithm for contextual linear bandits in non-stationary environments.
This unique combination of Bayesian and frequentist principles enhances adaptability and performance in dynamic settings.
arXiv Detail & Related papers (2023-07-07T13:29:07Z) - Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models.
In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints.
A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z) - Improved Branch and Bound for Neural Network Verification via Lagrangian
Decomposition [161.09660864941603]
We improve the scalability of Branch and Bound (BaB) algorithms for formally proving input-output properties of neural networks.
We present a novel activation-based branching strategy and a BaB framework, named Branch and Dual Network Bound (BaDNB)
BaDNB outperforms previous complete verification systems by a large margin, cutting average verification times by factors up to 50 on adversarial properties.
arXiv Detail & Related papers (2021-04-14T09:22:42Z) - Batched Neural Bandits [107.5072688105936]
BatchNeuralUCB combines neural networks with optimism to address the exploration-exploitation tradeoff.
We prove that BatchNeuralUCB achieves the same regret as the fully sequential version while reducing the number of policy updates considerably.
arXiv Detail & Related papers (2021-02-25T17:36:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.