Replicable Constrained Bandits
- URL: http://arxiv.org/abs/2602.14580v1
- Date: Mon, 16 Feb 2026 09:22:23 GMT
- Title: Replicable Constrained Bandits
- Authors: Matteo Bollini, Gianmarco Genalti, Francesco Emanuele Stradi, Matteo Castiglioni, Alberto Marchesi,
- Abstract summary: A emphreplicable online learning algorithm is one that takes the same sequence of decisions across different executions in the same environment, with high probability.<n>We design replicable algorithms whose regret and constraint violation match those of non-replicable ones in terms of $T$.<n>As a key step toward these guarantees, we develop the first replicable UCB-like algorithm for emphunconstrained MABs.
- Score: 27.71248958698115
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Algorithmic \emph{replicability} has recently been introduced to address the need for reproducible experiments in machine learning. A \emph{replicable online learning} algorithm is one that takes the same sequence of decisions across different executions in the same environment, with high probability. We initiate the study of algorithmic replicability in \emph{constrained} MAB problems, where a learner interacts with an unknown stochastic environment for $T$ rounds, seeking not only to maximize reward but also to satisfy multiple constraints. Our main result is that replicability can be achieved in constrained MABs. Specifically, we design replicable algorithms whose regret and constraint violation match those of non-replicable ones in terms of $T$. As a key step toward these guarantees, we develop the first replicable UCB-like algorithm for \emph{unconstrained} MABs, showing that algorithms that employ the optimism in-the-face-of-uncertainty principle can be replicable, a result that we believe is of independent interest.
Related papers
- UCB-type Algorithm for Budget-Constrained Expert Learning [71.67657715154034]
algnameM-LCB is a UCB-style meta-algorithm that provides emphanytime regret guarantees<n>We show how algnameM-LCB extends the classical bandit paradigm to the more realistic scenario of coordinating stateful, self-learning experts under limited resources.
arXiv Detail & Related papers (2025-10-26T12:36:17Z) - Replicable Reinforcement Learning with Linear Function Approximation [21.370247743205056]
We introduce two efficient algorithms for replicable random design regression and uncentered covariance estimation.<n>We then leverage these tools to provide the first provably efficient RL algorithms for linear Markov decision processes.<n>We evaluate our algorithms experimentally and show how they can inspire more consistent neural policies.
arXiv Detail & Related papers (2025-09-10T14:56:09Z) - In-Context Algorithm Emulation in Fixed-Weight Transformers [13.585357287532588]
A minimal Transformer with frozen weights emulates a broad class of algorithms by in-context prompting.<n>A single fixed-weight two-layer softmax attention module emulates all algorithms from the task-specific class via only prompting.
arXiv Detail & Related papers (2025-08-24T23:20:31Z) - Replicable Learning of Large-Margin Halfspaces [46.91303295440005]
We provide efficient algorithms for the problem of learning large-margin halfspaces.
Our results improve upon the algorithms provided by Impagliazzo, Lei, Pitassi, and Sorrell [STOC 2022]
arXiv Detail & Related papers (2024-02-21T15:06:51Z) - Replicability is Asymptotically Free in Multi-armed Bandits [41.86005698892541]
We consider a replicable multi-armed bandit algorithm that ensures, with high probability, that the algorithm's sequence of actions is not affected by the randomness inherent in the dataset.<n>We propose a principled approach to limiting the probability of nonreplication.
arXiv Detail & Related papers (2024-02-12T03:31:34Z) - Batch Bayesian Optimization for Replicable Experimental Design [56.64902148159355]
Many real-world design problems evaluate multiple experimental conditions in parallel and replicate each condition multiple times due to large and heteroscedastic observation noise.
We propose the Batch Thompson Sampling for Replicable Experimental Design framework, which encompasses three algorithms.
We show the effectiveness of our algorithms in two practical real-world applications: precision agriculture and AutoML.
arXiv Detail & Related papers (2023-11-02T12:46:03Z) - Replicable Reinforcement Learning [15.857503103543308]
We provide a provably replicable algorithm for parallel value iteration, and a provably replicable version of R-max in the episodic setting.
These are the first formal replicability results for control problems, which present different challenges for replication than batch learning settings.
arXiv Detail & Related papers (2023-05-24T16:05:15Z) - Replicability and stability in learning [16.936594801109557]
Impagliazzo, Lei, Pitassi and Sorrell (22) recently initiated the study of replicability in machine learning.
We show how to boost any replicable algorithm so that it produces the same output with probability arbitrarily close to 1.
We prove that list replicability can be boosted so that it is achieved with probability arbitrarily close to 1.
arXiv Detail & Related papers (2023-04-07T17:52:26Z) - Accelerated First-Order Optimization under Nonlinear Constraints [61.98523595657983]
We exploit between first-order algorithms for constrained optimization and non-smooth systems to design a new class of accelerated first-order algorithms.<n>An important property of these algorithms is that constraints are expressed in terms of velocities instead of sparse variables.
arXiv Detail & Related papers (2023-02-01T08:50:48Z) - Minimax Optimization with Smooth Algorithmic Adversaries [59.47122537182611]
We propose a new algorithm for the min-player against smooth algorithms deployed by an adversary.
Our algorithm is guaranteed to make monotonic progress having no limit cycles, and to find an appropriate number of gradient ascents.
arXiv Detail & Related papers (2021-06-02T22:03:36Z) - Byzantine-Resilient Non-Convex Stochastic Gradient Descent [61.6382287971982]
adversary-resilient distributed optimization, in which.
machines can independently compute gradients, and cooperate.
Our algorithm is based on a new concentration technique, and its sample complexity.
It is very practical: it improves upon the performance of all prior methods when no.
setting machines are present.
arXiv Detail & Related papers (2020-12-28T17:19:32Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.