On Learning to Rank Long Sequences with Contextual Bandits
- URL: http://arxiv.org/abs/2106.03546v1
- Date: Mon, 7 Jun 2021 12:16:34 GMT
- Title: On Learning to Rank Long Sequences with Contextual Bandits
- Authors: Anirban Santara, Claudio Gentile, Gaurav Aggarwal, Shuai Li
- Abstract summary: We introduce a variant of the cascading bandit model that considers flexible length sequences with varying rewards and losses.
Our analysis delivers tight regret bounds which, when specialized to vanilla cascading bandits, results in sharper guarantees than previously available in the literature.
- Score: 17.97356309346139
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Motivated by problems of learning to rank long item sequences, we introduce a
variant of the cascading bandit model that considers flexible length sequences
with varying rewards and losses. We formulate two generative models for this
problem within the generalized linear setting, and design and analyze upper
confidence algorithms for it. Our analysis delivers tight regret bounds which,
when specialized to vanilla cascading bandits, results in sharper guarantees
than previously available in the literature. We evaluate our algorithms on a
number of real-world datasets, and show significantly improved empirical
performance as compared to known cascading bandit baselines.
Related papers
- Neural Dueling Bandits [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms.
We then extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z) - High-dimensional Linear Bandits with Knapsacks [8.862707047517913]
We study the contextual bandits with knapsack (CBwK) problem under the high-dimensional setting where the dimension of the feature is large.
We develop an online variant of the hard thresholding algorithm that performs the sparse estimation in an online manner.
We show that this integrated approach allows us to achieve a sublinear regret that depends logarithmically on the feature dimension.
arXiv Detail & Related papers (2023-11-02T15:40:33Z) - Unsupervised Feature Based Algorithms for Time Series Extrinsic
Regression [0.9659642285903419]
Time Series Extrinsic Regression (TSER) involves using a set of training time series to form a predictive model of a continuous response variable.
DrCIF and FreshPRINCE models are the only ones that significantly outperform the standard rotation forest regressor.
arXiv Detail & Related papers (2023-05-02T13:58:20Z) - Linear Partial Monitoring for Sequential Decision-Making: Algorithms,
Regret Bounds and Applications [70.67112733968654]
Partial monitoring is an expressive framework for sequential decision-making.
We present a simple and unified analysis of partial monitoring, and further extend the model to the contextual and kernelized setting.
arXiv Detail & Related papers (2023-02-07T18:58:25Z) - Mutual Exclusivity Training and Primitive Augmentation to Induce
Compositionality [84.94877848357896]
Recent datasets expose the lack of the systematic generalization ability in standard sequence-to-sequence models.
We analyze this behavior of seq2seq models and identify two contributing factors: a lack of mutual exclusivity bias and the tendency to memorize whole examples.
We show substantial empirical improvements using standard sequence-to-sequence models on two widely-used compositionality datasets.
arXiv Detail & Related papers (2022-11-28T17:36:41Z) - On Kernelized Multi-Armed Bandits with Constraints [16.102401271318012]
We study a bandit problem with a general unknown reward function and a general unknown constraint function.
We propose a general framework for both algorithm performance analysis.
We demonstrate the superior performance of our proposed algorithms via numerical experiments.
arXiv Detail & Related papers (2022-03-29T14:02:03Z) - Deep Hierarchy in Bandits [51.22833900944146]
Mean rewards of actions are often correlated.
To maximize statistical efficiency, it is important to leverage these correlations when learning.
We formulate a bandit variant of this problem where the correlations of mean action rewards are represented by a hierarchical Bayesian model.
arXiv Detail & Related papers (2022-02-03T08:15:53Z) - Outlier-Robust Learning of Ising Models Under Dobrushin's Condition [57.89518300699042]
We study the problem of learning Ising models satisfying Dobrushin's condition in the outlier-robust setting where a constant fraction of the samples are adversarially corrupted.
Our main result is to provide the first computationally efficient robust learning algorithm for this problem with near-optimal error guarantees.
arXiv Detail & Related papers (2021-02-03T18:00:57Z) - Influence Diagram Bandits: Variational Thompson Sampling for Structured
Bandit Problems [40.957688390621385]
Our framework captures complex statistical dependencies between actions, latent variables, and observations.
We develop novel online learning algorithms that learn to act efficiently in our models.
arXiv Detail & Related papers (2020-07-09T16:25:40Z) - A Constraint-Based Algorithm for the Structural Learning of
Continuous-Time Bayesian Networks [70.88503833248159]
We propose the first constraint-based algorithm for learning the structure of continuous-time Bayesian networks.
We discuss the different statistical tests and the underlying hypotheses used by our proposal to establish conditional independence.
arXiv Detail & Related papers (2020-07-07T07:34:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.