Related papers: Efficient Algorithms for Logistic Contextual Slate Bandits with Bandit Feedback

Efficient Algorithms for Logistic Contextual Slate Bandits with Bandit Feedback

URL: http://arxiv.org/abs/2506.13163v1
Date: Mon, 16 Jun 2025 07:19:02 GMT
Title: Efficient Algorithms for Logistic Contextual Slate Bandits with Bandit Feedback
Authors: Tanmay Goyal, Gaurav Sinha,
Abstract summary: We study the Logistic Contextual Slate Bandit problem.<n>A single binary reward, determined by a logistic model, is observed for the chosen slate.<n>We propose two algorithms, Slate-GLM-OFU and Slate-GLM-TS, that accomplish this goal.
Score: 0.8287206589886881
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We study the Logistic Contextual Slate Bandit problem, where, at each round, an agent selects a slate of $N$ items from an exponentially large set (of size $2^{\Omega(N)}$) of candidate slates provided by the environment. A single binary reward, determined by a logistic model, is observed for the chosen slate. Our objective is to develop algorithms that maximize cumulative reward over $T$ rounds while maintaining low per-round computational costs. We propose two algorithms, Slate-GLM-OFU and Slate-GLM-TS, that accomplish this goal. These algorithms achieve $N^{O(1)}$ per-round time complexity via local planning (independent slot selections), and low regret through global learning (joint parameter estimation). We provide theoretical and empirical evidence supporting these claims. Under a well-studied diversity assumption, we prove that Slate-GLM-OFU incurs only $\tilde{O}(\sqrt{T})$ regret. Extensive experiments across a wide range of synthetic settings demonstrate that our algorithms consistently outperform state-of-the-art baselines, achieving both the lowest regret and the fastest runtime. Furthermore, we apply our algorithm to select in-context examples in prompts of Language Models for solving binary classification tasks such as sentiment analysis. Our approach achieves competitive test accuracy, making it a viable alternative in practical scenarios.

Related papers

Achieving Limited Adaptivity for Multinomial Logistic Bandits [0.7373617024876725]
We present two algorithms that operate in the batched and rarely-switching paradigms.<n>B-MNL-CB achieves $tildeO(sqrtT)$ regret when presented with $tildeO(log log T)$ update rounds.<n> RS-MNL works with adversarially generated contexts and can achieve $tildeO(sqrtT)$ regret with $tildeO(log T)$ policy updates.
arXiv Detail & Related papers (2025-08-05T04:43:01Z)
Simple and Provable Scaling Laws for the Test-Time Compute of Large Language Models [70.07661254213181]
We propose two algorithms that enjoy provable scaling laws for the test-time compute of large language models.<n>One is a two-stage knockout-style algorithm, where each candidate is evaluated by its average win rate against multiple opponents.<n>The other is a two-stage league-style algorithm, where each candidate is evaluated by its average win rate against multiple opponents.
arXiv Detail & Related papers (2024-11-29T05:29:47Z)
Perturb-and-Project: Differentially Private Similarities and Marginals [73.98880839337873]
We revisit the input perturbations framework for differential privacy where noise is added to the input $Ain mathcalS$. We first design novel efficient algorithms to privately release pair-wise cosine similarities. We derive a novel algorithm to compute $k$-way marginal queries over $n$ features.
arXiv Detail & Related papers (2024-06-07T12:07:16Z)
Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
We propose a novelgreedy bandit (SGB) algorithm for multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time $tin [T]$ is observed. SGB adopts an optimized-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms.
arXiv Detail & Related papers (2023-12-13T11:08:25Z)
Improved Active Multi-Task Representation Learning via Lasso [44.607652031235716]
In this paper, we show the dominance of the L1-regularized-relevance-based ($nu1$) strategy by giving a lower bound for the $nu2$-based strategy. We also characterize the potential of our $nu1$-based strategy in sample-cost-sensitive settings.
arXiv Detail & Related papers (2023-06-05T03:08:29Z)
On the Sample Complexity of Representation Learning in Multi-task Bandits with Global and Local structure [77.60508571062958]
We investigate the sample complexity of learning the optimal arm for multi-task bandit problems. Arms consist of two components: one that is shared across tasks (that we call representation) and one that is task-specific (that we call predictor) We devise an algorithm OSRL-SC whose sample complexity approaches the lower bound, and scales at most as $H(Glog(delta_G)+ Xlog(delta_H))$, with $X,G,H$ being, respectively, the number of tasks, representations and predictors.
arXiv Detail & Related papers (2022-11-28T08:40:12Z)
A Spectral Approach to Item Response Theory [6.5268245109828005]
We propose a emphnew item estimation algorithm for the Rasch model. The core of our algorithm is the computation of the stationary distribution of a Markov chain defined on an item-item graph. Experiments on synthetic and real-life datasets show that our algorithm is scalable, accurate, and competitive with the most commonly used methods in the literature.
arXiv Detail & Related papers (2022-10-09T18:57:08Z)
Robust Methods for High-Dimensional Linear Learning [0.0]
We propose statistically robust and computationally efficient linear learning methods in the high-dimensional batch setting. We instantiate our framework on several applications including vanilla sparse, group-sparse and low-rank matrix recovery. For vanilla $s$-sparsity, we are able to reach the $slog (d)/n$ rate under heavy-tails and $eta$-corruption.
arXiv Detail & Related papers (2022-08-10T17:00:41Z)
Optimal and Efficient Dynamic Regret Algorithms for Non-Stationary Dueling Bandits [27.279654173896372]
We study the problem of emphdynamic regret minimization in $K$-armed Dueling Bandits under non-stationary or time varying preferences. This is an online learning setup where the agent chooses a pair of items at each round and observes only a relative binary win-loss' feedback for this pair.
arXiv Detail & Related papers (2021-11-06T16:46:55Z)
Randomized Exploration for Reinforcement Learning with General Value Function Approximation [122.70803181751135]
We propose a model-free reinforcement learning algorithm inspired by the popular randomized least squares value iteration (RLSVI) algorithm. Our algorithm drives exploration by simply perturbing the training data with judiciously chosen i.i.d. scalar noises. We complement the theory with an empirical evaluation across known difficult exploration tasks.
arXiv Detail & Related papers (2021-06-15T02:23:07Z)
Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity [67.02490430380415]
We show that model-based MARL achieves a sample complexity of $tilde O(|S||B|(gamma)-3epsilon-2)$ for finding the Nash equilibrium (NE) value up to some $epsilon$ error. We also show that such a sample bound is minimax-optimal (up to logarithmic factors) if the algorithm is reward-agnostic, where the algorithm queries state transition samples without reward knowledge.
arXiv Detail & Related papers (2020-07-15T03:25:24Z)
Model Selection in Contextual Stochastic Bandit Problems [51.94632035240787]
We develop a meta-algorithm that selects between base algorithms. We show through a lower bound that even when one of the base algorithms has $O(sqrtT)$ regret, in general it is impossible to get better than $Omega(sqrtT)$ regret.
arXiv Detail & Related papers (2020-03-03T18:46:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.