Fast online inference for nonlinear contextual bandit based on
Generative Adversarial Network
- URL: http://arxiv.org/abs/2202.08867v1
- Date: Thu, 17 Feb 2022 19:11:48 GMT
- Title: Fast online inference for nonlinear contextual bandit based on
Generative Adversarial Network
- Authors: Yun Da Tsai, Shou De Lin
- Abstract summary: We propose a neural bandit model with an end-to-end training process to efficiently perform bandit algorithms during inference.
We advance state-of-the-art time complexity to $O(log n)$ with approximate Bayesian inference, neural random feature mapping, approximate global maxima and approximate nearest neighbor search.
- Score: 9.537503617354476
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This work addresses the efficiency concern on inferring a nonlinear
contextual bandit when the number of arms $n$ is very large. We propose a
neural bandit model with an end-to-end training process to efficiently perform
bandit algorithms such as Thompson Sampling and UCB during inference. We
advance state-of-the-art time complexity to $O(\log n)$ with approximate
Bayesian inference, neural random feature mapping, approximate global maxima
and approximate nearest neighbor search. We further propose a generative
adversarial network to shift the bottleneck of maximizing the objective for
selecting optimal arms from inference time to training time, enjoying
significant speedup with additional advantage of enabling batch and parallel
processing. %The generative model can inference an approximate argmax of the
posterior sampling in logarithmic time complexity with the help of approximate
nearest neighbor search. Extensive experiments on classification and
recommendation tasks demonstrate order-of-magnitude improvement in inference
time no significant degradation on the performance.
Related papers
- Asymptotically Optimal Linear Best Feasible Arm Identification with Fixed Budget [55.938644481736446]
We introduce a novel algorithm for best feasible arm identification that guarantees an exponential decay in the error probability.<n>We validate our algorithm through comprehensive empirical evaluations across various problem instances with different levels of complexity.
arXiv Detail & Related papers (2025-06-03T02:56:26Z) - Neural Dueling Bandits: Preference-Based Optimization with Human Feedback [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms.
We also extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z) - A Gaussian Process-based Streaming Algorithm for Prediction of Time Series With Regimes and Outliers [0.0]
Recently proposed INTEL algorithm provides a product of experts approach to online prediction of time series under possible regime switching.
We introduce LINTEL, which uses the exact filtering distribution at time $t$ with constant-time updates.
We show experimentally that our proposed approach is over five times faster than INTEL under reasonable settings with better quality predictions.
arXiv Detail & Related papers (2024-06-01T22:55:33Z) - DASA: Delay-Adaptive Multi-Agent Stochastic Approximation [64.32538247395627]
We consider a setting in which $N$ agents aim to speedup a common Approximation problem by acting in parallel and communicating with a central server.
To mitigate the effect of delays and stragglers, we propose textttDASA, a Delay-Adaptive algorithm for multi-agent Approximation.
arXiv Detail & Related papers (2024-03-25T22:49:56Z) - Making RL with Preference-based Feedback Efficient via Randomization [11.019088464664696]
Reinforcement Learning algorithms that learn from human feedback need to be efficient in terms of statistical complexity, computational complexity, and query complexity.
We present an algorithm that is sample efficient (i.e. has near-optimal-case regret bounds) and has running time (i.e. computational complexity is worst with respect to relevant parameters)
To extend the results to more general nonlinear function approximation, we design a model-based randomized algorithm inspired by the idea of Thompson sampling.
arXiv Detail & Related papers (2023-10-23T04:19:35Z) - Nearest Neighbour with Bandit Feedback [4.9094025705644695]
Our algorithm handles the fully adversarial setting in which no assumptions at all are made about the data-generation process.
We give generic regret for our algorithm and further analyse them when applied to the bandit problem in euclidean space.
arXiv Detail & Related papers (2023-06-23T20:09:01Z) - An Optimal Algorithm for the Real-Valued Combinatorial Pure Exploration
of Multi-Armed Bandit [65.268245109828]
We study the real-valued pure exploration problem in the multi-armed bandit (R-CPE-MAB)
Existing methods in the R-CPE-MAB can be seen as a special case of the so-called transductive linear bandits.
We propose an algorithm named the gap-based exploration (CombGapE) algorithm, whose sample complexity matches the lower bound.
arXiv Detail & Related papers (2023-06-15T15:37:31Z) - Learning Contextual Bandits Through Perturbed Rewards [107.6210145983805]
We show that a $tildeO(tildedsqrtT)$ regret upper bound is still achievable under standard regularity conditions.
We perturb the rewards when updating the neural network to eliminate the need of explicit exploration.
arXiv Detail & Related papers (2022-01-24T19:10:22Z) - Mean-based Best Arm Identification in Stochastic Bandits under Reward
Contamination [80.53485617514707]
This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits.
Specifically, for the gap-based algorithm, the sample complexity is optimal up to constant factors, while for the successive elimination, it is optimal up to logarithmic factors.
arXiv Detail & Related papers (2021-11-14T21:49:58Z) - Distributed Adaptive Nearest Neighbor Classifier: Algorithm and Theory [6.696267547013535]
We propose a novel distributed adaptive NN classifier for which the number of nearest neighbors is a tuning parameterally chosen by a data-driven criterion.
An early stopping rule is proposed when searching for the optimal tuning parameter, which improves the finite sample performance.
In particular, we show that when the sub-sample sizes are sufficiently large, the proposed classifier achieves the nearly optimal convergence rate.
arXiv Detail & Related papers (2021-05-20T14:38:41Z) - Neural Thompson Sampling [94.82847209157494]
We propose a new algorithm, called Neural Thompson Sampling, which adapts deep neural networks for both exploration and exploitation.
At the core of our algorithm is a novel posterior distribution of the reward, where its mean is the neural network approximator, and its variance is built upon the neural tangent features of the corresponding neural network.
arXiv Detail & Related papers (2020-10-02T07:44:09Z) - AIN: Fast and Accurate Sequence Labeling with Approximate Inference
Network [75.44925576268052]
The linear-chain Conditional Random Field (CRF) model is one of the most widely-used neural sequence labeling approaches.
Exact probabilistic inference algorithms are typically applied in training and prediction stages of the CRF model.
We propose to employ a parallelizable approximate variational inference algorithm for the CRF model.
arXiv Detail & Related papers (2020-09-17T12:18:43Z) - Bandit algorithms to emulate human decision making using probabilistic
distortions [20.422725678982726]
We formulate two sample multi-armed bandit problems with distorted probabilities on the reward distributions.
We consider the aforementioned problems in the regret minimization as well as best arm identification framework for multi-armed bandits.
arXiv Detail & Related papers (2016-11-30T17:37:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.