Max-Utility Based Arm Selection Strategy For Sequential Query
Recommendations
- URL: http://arxiv.org/abs/2108.13810v1
- Date: Tue, 31 Aug 2021 13:03:30 GMT
- Title: Max-Utility Based Arm Selection Strategy For Sequential Query
Recommendations
- Authors: Shameem A. Puthiya Parambath, Christos Anagnostopoulos, Roderick
Murray-Smith, Sean MacAvaney, Evangelos Zervas
- Abstract summary: We consider the query recommendation problem in closed loop interactive learning settings like online information gathering and exploratory analytics.
The problem can be naturally modelled using the Multi-Armed Bandits (MAB) framework with countably many arms.
We show that such a selection strategy often results in higher cumulative regret and to this end, we propose a selection strategy based on the maximum utility of the arms.
- Score: 16.986870945319293
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider the query recommendation problem in closed loop interactive
learning settings like online information gathering and exploratory analytics.
The problem can be naturally modelled using the Multi-Armed Bandits (MAB)
framework with countably many arms. The standard MAB algorithms for countably
many arms begin with selecting a random set of candidate arms and then applying
standard MAB algorithms, e.g., UCB, on this candidate set downstream. We show
that such a selection strategy often results in higher cumulative regret and to
this end, we propose a selection strategy based on the maximum utility of the
arms. We show that in tasks like online information gathering, where sequential
query recommendations are employed, the sequences of queries are correlated and
the number of potentially optimal queries can be reduced to a manageable size
by selecting queries with maximum utility with respect to the currently
executing query. Our experimental results using a recent real online literature
discovery service log file demonstrate that the proposed arm selection strategy
improves the cumulative regret substantially with respect to the
state-of-the-art baseline algorithms. % and commonly used random selection
strategy for a variety of contextual multi-armed bandit algorithms. Our data
model and source code are available at
~\url{https://anonymous.4open.science/r/0e5ad6b7-ac02-4577-9212-c9d505d3dbdb/}.
Related papers
- Contextual Bandits with Arm Request Costs and Delays [19.263086804406786]
We introduce a novel extension of the contextual bandit problem, where new sets of arms can be requested with time delays and associated costs.
In this setting, the learner can select multiple arms from a decision set, with each selection taking one unit of time.
We design algorithms that can effectively select arms and determine the appropriate time to request new arms, thereby minimizing their regret.
arXiv Detail & Related papers (2024-10-17T00:44:50Z) - Combinatorial Multi-armed Bandits: Arm Selection via Group Testing [36.24836468586544]
This paper considers the problem of multi-armed bandits with semi-bandit feedback and a cardinality constraint on the super-arm size.
Existing algorithms for solving this problem typically involve two key sub-routines.
This paper introduces a novel realistic alternative to the perfect oracle.
arXiv Detail & Related papers (2024-10-14T16:19:57Z) - Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
We propose a novelgreedy bandit (SGB) algorithm for multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time $tin [T]$ is observed.
SGB adopts an optimized-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms.
arXiv Detail & Related papers (2023-12-13T11:08:25Z) - Pure Exploration under Mediators' Feedback [63.56002444692792]
Multi-armed bandits are a sequential-decision-making framework, where, at each interaction step, the learner selects an arm and observes a reward.
We consider the scenario in which the learner has access to a set of mediators, each of which selects the arms on the agent's behalf according to a and possibly unknown policy.
We propose a sequential decision-making strategy for discovering the best arm under the assumption that the mediators' policies are known to the learner.
arXiv Detail & Related papers (2023-08-29T18:18:21Z) - Adapting Bandit Algorithms for Settings with Sequentially Available Arms [0.0]
We propose a meta-algorithm, namely Sequential Pull/No-pull for MAB (Seq), to adapt any classical MAB policy to better suit this setting.
The proposed meta-algorithm gathers more information, especially in the first rounds, characterized by a high uncertainty in the arms estimate value.
The Seq meta-algorithm was extensively tested and compared with classical MAB policies on synthetic and real-world datasets.
arXiv Detail & Related papers (2021-09-30T15:56:37Z) - Reinforcement Learning for Finite-Horizon Restless Multi-Armed
Multi-Action Bandits [8.136957953239254]
We study a finite-horizon restless multi-armed bandit problem with multiple actions dubbed R(MA)2B.
The state of each arm evolves according to a controlled Markov decision process (MDP), and the reward of pulling an arm depends on both the current state of the corresponding MDP and the action taken.
Since finding the optimal policy is typically intractable, we propose a computationally appealing index policy which we call Occupancy-Measured-Reward Index Policy.
arXiv Detail & Related papers (2021-09-20T21:40:12Z) - Machine Learning for Online Algorithm Selection under Censored Feedback [71.6879432974126]
In online algorithm selection (OAS), instances of an algorithmic problem class are presented to an agent one after another, and the agent has to quickly select a presumably best algorithm from a fixed set of candidate algorithms.
For decision problems such as satisfiability (SAT), quality typically refers to the algorithm's runtime.
In this work, we revisit multi-armed bandit algorithms for OAS and discuss their capability of dealing with the problem.
We adapt them towards runtime-oriented losses, allowing for partially censored data while keeping a space- and time-complexity independent of the time horizon.
arXiv Detail & Related papers (2021-09-13T18:10:52Z) - Determinantal Beam Search [75.84501052642361]
Beam search is a go-to strategy for decoding neural sequence models.
In use-cases that call for multiple solutions, a diverse or representative set is often desired.
By posing iterations in beam search as a series of subdeterminant problems, we can turn the algorithm into a diverse subset selection process.
arXiv Detail & Related papers (2021-06-14T13:01:46Z) - Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit
Feedback [51.21673420940346]
Combinatorial bandits generalize multi-armed bandits, where the agent chooses sets of arms and observes a noisy reward for each arm contained in the chosen set.
We focus on the pure-exploration problem of identifying the best arm with fixed confidence, as well as a more general setting, where the structure of the answer set differs from the one of the action set.
Based on a projection-free online learning algorithm for finite polytopes, it is the first computationally efficient algorithm which is convexally optimal and has competitive empirical performance.
arXiv Detail & Related papers (2021-01-21T10:35:09Z) - Online Preselection with Context Information under the Plackett-Luce
Model [10.286111512725334]
We consider an extension of the contextual multi-armed bandit problem.
Instead of selecting a single alternative (arm), a learner is supposed to make a preselection in the form of a subset of alternatives.
We propose the CPPL algorithm, which is inspired by the well-known UCB algorithm.
arXiv Detail & Related papers (2020-02-11T09:27:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.