Related papers: On Learning to Rank Long Sequences with Contextual Bandits

On Learning to Rank Long Sequences with Contextual Bandits

URL: http://arxiv.org/abs/2106.03546v1
Date: Mon, 7 Jun 2021 12:16:34 GMT
Title: On Learning to Rank Long Sequences with Contextual Bandits
Authors: Anirban Santara, Claudio Gentile, Gaurav Aggarwal, Shuai Li
Abstract summary: We introduce a variant of the cascading bandit model that considers flexible length sequences with varying rewards and losses. Our analysis delivers tight regret bounds which, when specialized to vanilla cascading bandits, results in sharper guarantees than previously available in the literature.
Score: 17.97356309346139
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Motivated by problems of learning to rank long item sequences, we introduce a variant of the cascading bandit model that considers flexible length sequences with varying rewards and losses. We formulate two generative models for this problem within the generalized linear setting, and design and analyze upper confidence algorithms for it. Our analysis delivers tight regret bounds which, when specialized to vanilla cascading bandits, results in sharper guarantees than previously available in the literature. We evaluate our algorithms on a number of real-world datasets, and show significantly improved empirical performance as compared to known cascading bandit baselines.

Related papers

Incremental Sequence Classification with Temporal Consistency [9.65650774513798]
We address the problem of incremental sequence classification, where predictions are updated as new elements in the sequence are revealed.<n>We leverage a temporal-consistency condition that successive predictions should satisfy to develop a novel loss function for training incremental sequence classifiers.<n>Our results show that models trained with our method are better able to distinguish promising generations from unpromising ones after observing only a few tokens.
arXiv Detail & Related papers (2025-05-22T11:37:53Z)
Causal bandits with backdoor adjustment on unknown Gaussian DAGs [5.807183284468881]
We study the causal bandit problem when the graph structure is unknown. We identify backdoor adjustment sets for each arm using sequentially generated experimental and observational data. We develop a novel bandit algorithm, based on modified upper confidence bounds, to sequentially determine the optimal intervention.
arXiv Detail & Related papers (2025-02-04T05:18:35Z)
Neural Dueling Bandits [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms. We then extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z)
High-dimensional Linear Bandits with Knapsacks [8.862707047517913]
We study the contextual bandits with knapsack (CBwK) problem under the high-dimensional setting where the dimension of the feature is large. We develop an online variant of the hard thresholding algorithm that performs the sparse estimation in an online manner. We show that this integrated approach allows us to achieve a sublinear regret that depends logarithmically on the feature dimension.
arXiv Detail & Related papers (2023-11-02T15:40:33Z)
Unsupervised Feature Based Algorithms for Time Series Extrinsic Regression [0.9659642285903419]
Time Series Extrinsic Regression (TSER) involves using a set of training time series to form a predictive model of a continuous response variable. DrCIF and FreshPRINCE models are the only ones that significantly outperform the standard rotation forest regressor.
arXiv Detail & Related papers (2023-05-02T13:58:20Z)
Linear Partial Monitoring for Sequential Decision-Making: Algorithms, Regret Bounds and Applications [70.67112733968654]
Partial monitoring is an expressive framework for sequential decision-making. We present a simple and unified analysis of partial monitoring, and further extend the model to the contextual and kernelized setting.
arXiv Detail & Related papers (2023-02-07T18:58:25Z)
Mutual Exclusivity Training and Primitive Augmentation to Induce Compositionality [84.94877848357896]
Recent datasets expose the lack of the systematic generalization ability in standard sequence-to-sequence models. We analyze this behavior of seq2seq models and identify two contributing factors: a lack of mutual exclusivity bias and the tendency to memorize whole examples. We show substantial empirical improvements using standard sequence-to-sequence models on two widely-used compositionality datasets.
arXiv Detail & Related papers (2022-11-28T17:36:41Z)
On Kernelized Multi-Armed Bandits with Constraints [16.102401271318012]
We study a bandit problem with a general unknown reward function and a general unknown constraint function. We propose a general framework for both algorithm performance analysis. We demonstrate the superior performance of our proposed algorithms via numerical experiments.
arXiv Detail & Related papers (2022-03-29T14:02:03Z)
Deep Hierarchy in Bandits [51.22833900944146]
Mean rewards of actions are often correlated. To maximize statistical efficiency, it is important to leverage these correlations when learning. We formulate a bandit variant of this problem where the correlations of mean action rewards are represented by a hierarchical Bayesian model.
arXiv Detail & Related papers (2022-02-03T08:15:53Z)
Outlier-Robust Learning of Ising Models Under Dobrushin's Condition [57.89518300699042]
We study the problem of learning Ising models satisfying Dobrushin's condition in the outlier-robust setting where a constant fraction of the samples are adversarially corrupted. Our main result is to provide the first computationally efficient robust learning algorithm for this problem with near-optimal error guarantees.
arXiv Detail & Related papers (2021-02-03T18:00:57Z)
Influence Diagram Bandits: Variational Thompson Sampling for Structured Bandit Problems [40.957688390621385]
Our framework captures complex statistical dependencies between actions, latent variables, and observations. We develop novel online learning algorithms that learn to act efficiently in our models.
arXiv Detail & Related papers (2020-07-09T16:25:40Z)
A Constraint-Based Algorithm for the Structural Learning of Continuous-Time Bayesian Networks [70.88503833248159]
We propose the first constraint-based algorithm for learning the structure of continuous-time Bayesian networks. We discuss the different statistical tests and the underlying hypotheses used by our proposal to establish conditional independence.
arXiv Detail & Related papers (2020-07-07T07:34:09Z)
Consistency of a Recurrent Language Model With Respect to Incomplete Decoding [67.54760086239514]
We study the issue of receiving infinite-length sequences from a recurrent language model. We propose two remedies which address inconsistency: consistent variants of top-k and nucleus sampling, and a self-terminating recurrent language model.
arXiv Detail & Related papers (2020-02-06T19:56:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.