GBOSE: Generalized Bandit Orthogonalized Semiparametric Estimation
- URL: http://arxiv.org/abs/2301.08781v1
- Date: Fri, 20 Jan 2023 19:39:10 GMT
- Title: GBOSE: Generalized Bandit Orthogonalized Semiparametric Estimation
- Authors: Mubarrat Chowdhury, Elkhan Ismayilzada, Khalequzzaman Sayem and Gi-Soo
Kim
- Abstract summary: We propose a new algorithm with a semi-parametric reward model with state-of-the-art complexity of upper bound on regret.
Our work expands the scope of another representative algorithm of state-of-the-art complexity with a similar reward model by proposing an algorithm built upon the same action filtering procedures.
We derive the said complexity of the upper bound on regret and present simulation results that affirm our methods superiority out of all prevalent semi-parametric bandit algorithms for cases involving over two arms.
- Score: 3.441021278275805
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In sequential decision-making scenarios i.e., mobile health recommendation
systems revenue management contextual multi-armed bandit algorithms have
garnered attention for their performance. But most of the existing algorithms
are built on the assumption of a strictly parametric reward model mostly linear
in nature. In this work we propose a new algorithm with a semi-parametric
reward model with state-of-the-art complexity of upper bound on regret amongst
existing semi-parametric algorithms. Our work expands the scope of another
representative algorithm of state-of-the-art complexity with a similar reward
model by proposing an algorithm built upon the same action filtering procedures
but provides explicit action selection distribution for scenarios involving
more than two arms at a particular time step while requiring fewer
computations. We derive the said complexity of the upper bound on regret and
present simulation results that affirm our methods superiority out of all
prevalent semi-parametric bandit algorithms for cases involving over two arms.
Related papers
- Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
We propose a novelgreedy bandit (SGB) algorithm for multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time $tin [T]$ is observed.
SGB adopts an optimized-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms.
arXiv Detail & Related papers (2023-12-13T11:08:25Z) - From Optimization to Control: Quasi Policy Iteration [3.4376560669160394]
We introduce a novel control algorithm coined as quasi-policy iteration (QPI)
QPI is based on a novel approximation of the "Hessian" matrix in the policy iteration algorithm by exploiting two linear structural constraints specific to MDPs.
It exhibits an empirical convergence behavior similar to policy iteration with a very low sensitivity to the discount factor.
arXiv Detail & Related papers (2023-11-18T21:00:14Z) - Fixed-Budget Real-Valued Combinatorial Pure Exploration of Multi-Armed
Bandit [65.268245109828]
We first introduce the Combinatorial Successive Asign algorithm, which is the first algorithm that can identify the best action even when the size of the action class is exponentially large.
We show that the upper bound of the probability of error of the CSA algorithm matches a lower bound up to a logarithmic factor in the exponent.
We experimentally compare the algorithms with previous methods and show that our algorithm performs better.
arXiv Detail & Related papers (2023-10-24T09:47:32Z) - Discrete Choice Multi-Armed Bandits [0.0]
This paper establishes a connection between a category of discrete choice models and the realms of online learning and multiarmed bandit algorithms.
We furnish sublinear regret bounds for a comprehensive family of algorithms, encompassing the Exp3 algorithm as a particular case.
We introduce a novel family of adversarial multiarmed bandit algorithms, drawing inspiration from the generalized nested logit models.
arXiv Detail & Related papers (2023-10-01T03:41:04Z) - An Optimal Algorithm for the Real-Valued Combinatorial Pure Exploration
of Multi-Armed Bandit [65.268245109828]
We study the real-valued pure exploration problem in the multi-armed bandit (R-CPE-MAB)
Existing methods in the R-CPE-MAB can be seen as a special case of the so-called transductive linear bandits.
We propose an algorithm named the gap-based exploration (CombGapE) algorithm, whose sample complexity matches the lower bound.
arXiv Detail & Related papers (2023-06-15T15:37:31Z) - Optimal Gradient-based Algorithms for Non-concave Bandit Optimization [76.57464214864756]
This work considers a large family of bandit problems where the unknown underlying reward function is non-concave.
Our algorithms are based on a unified zeroth-order optimization paradigm that applies in great generality.
We show that the standard optimistic algorithms are sub-optimal by dimension factors.
arXiv Detail & Related papers (2021-07-09T16:04:24Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - An Empirical Process Approach to the Union Bound: Practical Algorithms
for Combinatorial and Linear Bandits [34.06611065493047]
This paper proposes near-optimal algorithms for the pure-exploration linear bandit problem in the fixed confidence and fixed budget settings.
We provide an algorithm whose sample complexity scales with the geometry of the instance and avoids an explicit union bound over the number of arms.
We also propose the first algorithm for linear bandits in the the fixed budget setting.
arXiv Detail & Related papers (2020-06-21T00:56:33Z) - Iterative Algorithm Induced Deep-Unfolding Neural Networks: Precoding
Design for Multiuser MIMO Systems [59.804810122136345]
We propose a framework for deep-unfolding, where a general form of iterative algorithm induced deep-unfolding neural network (IAIDNN) is developed.
An efficient IAIDNN based on the structure of the classic weighted minimum mean-square error (WMMSE) iterative algorithm is developed.
We show that the proposed IAIDNN efficiently achieves the performance of the iterative WMMSE algorithm with reduced computational complexity.
arXiv Detail & Related papers (2020-06-15T02:57:57Z) - SONIA: A Symmetric Blockwise Truncated Optimization Algorithm [2.9923891863939938]
This work presents a new algorithm for empirical risk.
The algorithm bridges the gap between first- and second-order search methods by computing a second-order search-type update in one subspace, coupled with a scaled steepest descent step in the Theoretical complement.
arXiv Detail & Related papers (2020-06-06T19:28:14Z) - Bandit algorithms to emulate human decision making using probabilistic
distortions [20.422725678982726]
We formulate two sample multi-armed bandit problems with distorted probabilities on the reward distributions.
We consider the aforementioned problems in the regret minimization as well as best arm identification framework for multi-armed bandits.
arXiv Detail & Related papers (2016-11-30T17:37:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.