Related papers: Max-Utility Based Arm Selection Strategy For Sequential Query Recommendations

Max-Utility Based Arm Selection Strategy For Sequential Query Recommendations

URL: http://arxiv.org/abs/2108.13810v1
Date: Tue, 31 Aug 2021 13:03:30 GMT
Title: Max-Utility Based Arm Selection Strategy For Sequential Query Recommendations
Authors: Shameem A. Puthiya Parambath, Christos Anagnostopoulos, Roderick Murray-Smith, Sean MacAvaney, Evangelos Zervas
Abstract summary: We consider the query recommendation problem in closed loop interactive learning settings like online information gathering and exploratory analytics. The problem can be naturally modelled using the Multi-Armed Bandits (MAB) framework with countably many arms. We show that such a selection strategy often results in higher cumulative regret and to this end, we propose a selection strategy based on the maximum utility of the arms.
Score: 16.986870945319293
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We consider the query recommendation problem in closed loop interactive learning settings like online information gathering and exploratory analytics. The problem can be naturally modelled using the Multi-Armed Bandits (MAB) framework with countably many arms. The standard MAB algorithms for countably many arms begin with selecting a random set of candidate arms and then applying standard MAB algorithms, e.g., UCB, on this candidate set downstream. We show that such a selection strategy often results in higher cumulative regret and to this end, we propose a selection strategy based on the maximum utility of the arms. We show that in tasks like online information gathering, where sequential query recommendations are employed, the sequences of queries are correlated and the number of potentially optimal queries can be reduced to a manageable size by selecting queries with maximum utility with respect to the currently executing query. Our experimental results using a recent real online literature discovery service log file demonstrate that the proposed arm selection strategy improves the cumulative regret substantially with respect to the state-of-the-art baseline algorithms. % and commonly used random selection strategy for a variety of contextual multi-armed bandit algorithms. Our data model and source code are available at ~\url{https://anonymous.4open.science/r/0e5ad6b7-ac02-4577-9212-c9d505d3dbdb/}.

Related papers

Sample Efficient Demonstration Selection for In-Context Learning [4.637359990120997]
In this paper, we formulate the exemplar selection task as a top-m best arms identification problem.<n>A key challenge in this setup is the exponentially large number of arms that need to be evaluated to identify the m-best arms.<n>We propose CASE, a novel sample-efficient selective exploration strategy that maintains a shortlist of "challenger" arms.
arXiv Detail & Related papers (2025-06-10T09:16:11Z)
MUSS: Multilevel Subset Selection for Relevance and Diversity [4.8254343133177295]
In recommender systems, one is interested in selecting relevant items, while providing a diversified recommendation. We propose MUSS, a novel method that uses a multilevel approach to relevant and diverse selection. Our method can achieve the same level of performance as baselines, but 4.5 to 20 times faster.
arXiv Detail & Related papers (2025-03-14T06:37:17Z)
Contextual Bandits with Arm Request Costs and Delays [19.263086804406786]
We introduce a novel extension of the contextual bandit problem, where new sets of arms can be requested with time delays and associated costs. In this setting, the learner can select multiple arms from a decision set, with each selection taking one unit of time. We design algorithms that can effectively select arms and determine the appropriate time to request new arms, thereby minimizing their regret.
arXiv Detail & Related papers (2024-10-17T00:44:50Z)
Combinatorial Multi-armed Bandits: Arm Selection via Group Testing [36.24836468586544]
This paper considers the problem of multi-armed bandits with semi-bandit feedback and a cardinality constraint on the super-arm size. Existing algorithms for solving this problem typically involve two key sub-routines. This paper introduces a novel realistic alternative to the perfect oracle.
arXiv Detail & Related papers (2024-10-14T16:19:57Z)
Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms. In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z)
Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
We propose a novelgreedy bandit (SGB) algorithm for multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time $tin [T]$ is observed. SGB adopts an optimized-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms.
arXiv Detail & Related papers (2023-12-13T11:08:25Z)
Pure Exploration under Mediators' Feedback [63.56002444692792]
Multi-armed bandits are a sequential-decision-making framework, where, at each interaction step, the learner selects an arm and observes a reward. We consider the scenario in which the learner has access to a set of mediators, each of which selects the arms on the agent's behalf according to a and possibly unknown policy. We propose a sequential decision-making strategy for discovering the best arm under the assumption that the mediators' policies are known to the learner.
arXiv Detail & Related papers (2023-08-29T18:18:21Z)
Adapting Bandit Algorithms for Settings with Sequentially Available Arms [0.0]
We propose a meta-algorithm, namely Sequential Pull/No-pull for MAB (Seq), to adapt any classical MAB policy to better suit this setting. The proposed meta-algorithm gathers more information, especially in the first rounds, characterized by a high uncertainty in the arms estimate value. The Seq meta-algorithm was extensively tested and compared with classical MAB policies on synthetic and real-world datasets.
arXiv Detail & Related papers (2021-09-30T15:56:37Z)
Reinforcement Learning for Finite-Horizon Restless Multi-Armed Multi-Action Bandits [8.136957953239254]
We study a finite-horizon restless multi-armed bandit problem with multiple actions dubbed R(MA)2B. The state of each arm evolves according to a controlled Markov decision process (MDP), and the reward of pulling an arm depends on both the current state of the corresponding MDP and the action taken. Since finding the optimal policy is typically intractable, we propose a computationally appealing index policy which we call Occupancy-Measured-Reward Index Policy.
arXiv Detail & Related papers (2021-09-20T21:40:12Z)
Machine Learning for Online Algorithm Selection under Censored Feedback [71.6879432974126]
In online algorithm selection (OAS), instances of an algorithmic problem class are presented to an agent one after another, and the agent has to quickly select a presumably best algorithm from a fixed set of candidate algorithms. For decision problems such as satisfiability (SAT), quality typically refers to the algorithm's runtime. In this work, we revisit multi-armed bandit algorithms for OAS and discuss their capability of dealing with the problem. We adapt them towards runtime-oriented losses, allowing for partially censored data while keeping a space- and time-complexity independent of the time horizon.
arXiv Detail & Related papers (2021-09-13T18:10:52Z)
Determinantal Beam Search [75.84501052642361]
Beam search is a go-to strategy for decoding neural sequence models. In use-cases that call for multiple solutions, a diverse or representative set is often desired. By posing iterations in beam search as a series of subdeterminant problems, we can turn the algorithm into a diverse subset selection process.
arXiv Detail & Related papers (2021-06-14T13:01:46Z)
Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit Feedback [51.21673420940346]
Combinatorial bandits generalize multi-armed bandits, where the agent chooses sets of arms and observes a noisy reward for each arm contained in the chosen set. We focus on the pure-exploration problem of identifying the best arm with fixed confidence, as well as a more general setting, where the structure of the answer set differs from the one of the action set. Based on a projection-free online learning algorithm for finite polytopes, it is the first computationally efficient algorithm which is convexally optimal and has competitive empirical performance.
arXiv Detail & Related papers (2021-01-21T10:35:09Z)
Online Preselection with Context Information under the Plackett-Luce Model [10.286111512725334]
We consider an extension of the contextual multi-armed bandit problem. Instead of selecting a single alternative (arm), a learner is supposed to make a preselection in the form of a subset of alternatives. We propose the CPPL algorithm, which is inspired by the well-known UCB algorithm.
arXiv Detail & Related papers (2020-02-11T09:27:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.