Multi-Armed Bandits with Minimum Aggregated Revenue Constraints
- URL: http://arxiv.org/abs/2510.12523v1
- Date: Tue, 14 Oct 2025 13:47:34 GMT
- Title: Multi-Armed Bandits with Minimum Aggregated Revenue Constraints
- Authors: Ahmed Ben Yahmed, Hafedh El Ferchichi, Marc Abeille, Vianney Perchet,
- Abstract summary: We design and analyze algorithms that either optimistically prioritize performance or pessimistically enforce constraint satisfaction.<n>We establish a lower bound demonstrating that the dependence on the time horizon in our results is optimal in general.
- Score: 27.081997104464012
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We examine a multi-armed bandit problem with contextual information, where the objective is to ensure that each arm receives a minimum aggregated reward across contexts while simultaneously maximizing the total cumulative reward. This framework captures a broad class of real-world applications where fair revenue allocation is critical and contextual variation is inherent. The cross-context aggregation of minimum reward constraints, while enabling better performance and easier feasibility, introduces significant technical challenges -- particularly the absence of closed-form optimal allocations typically available in standard MAB settings. We design and analyze algorithms that either optimistically prioritize performance or pessimistically enforce constraint satisfaction. For each algorithm, we derive problem-dependent upper bounds on both regret and constraint violations. Furthermore, we establish a lower bound demonstrating that the dependence on the time horizon in our results is optimal in general and revealing fundamental limitations of the free exploration principle leveraged in prior work.
Related papers
- A Simple Reduction Scheme for Constrained Contextual Bandits with Adversarial Contexts via Regression [7.798233121583888]
We study constrained contextual bandits with adversarially chosen contexts, where each action yields a random reward and incurs a random cost.<n>We adopt the standard realizability assumption: conditioned on the observed context, rewards and costs are drawn independently from fixed distributions whose expectations belong to known function classes.
arXiv Detail & Related papers (2026-02-04T20:19:55Z) - FedLoDrop: Federated LoRA with Dropout for Generalized LLM Fine-tuning [65.26899091946417]
Fine-tuning large language models (LLMs) is crucial for adapting general-purpose models to specific tasks.<n>This paper proposes Federated LoRA with Dropout (FedLoDrop), a new framework that applies dropout to the rows and columns of the trainable matrix in Federated LoRA.
arXiv Detail & Related papers (2025-10-14T02:40:45Z) - Universal Dynamic Regret and Constraint Violation Bounds for Constrained Online Convex Optimization [7.798233121583888]
We present two algorithms having simple modular structures that yield universal dynamic regret and cumulative constraint violation bounds.<n>Our results hold in the most general case when both the cost and constraint functions are chosen arbitrarily by an adversary.
arXiv Detail & Related papers (2025-10-02T10:19:16Z) - Bounded Rationality for LLMs: Satisficing Alignment at Inference-Time [52.230936493691985]
We propose SITAlign, an inference framework that addresses the multifaceted nature of alignment by maximizing a primary objective while satisfying threshold-based constraints on secondary criteria.<n>We provide theoretical insights by deriving sub-optimality bounds of our satisficing based inference alignment approach.
arXiv Detail & Related papers (2025-05-29T17:56:05Z) - From Restless to Contextual: A Thresholding Bandit Reformulation For Finite-horizon Performance [8.173852377640964]
We introduce a reformulation of online RBs as a emphbudgeted thresholding contextual bandit.<n>We prove the first non-asymptotic optimality of an oracle policy for a simplified finite-horizon setting.<n>Our work provides a new pathway for achieving practical, sample-efficient learning in finite-horizon RBs.
arXiv Detail & Related papers (2025-02-07T18:23:43Z) - When Demonstrations Meet Generative World Models: A Maximum Likelihood
Framework for Offline Inverse Reinforcement Learning [62.00672284480755]
This paper aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent.
Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving.
arXiv Detail & Related papers (2023-02-15T04:14:20Z) - A Unifying Framework for Online Optimization with Long-Term Constraints [62.35194099438855]
We study online learning problems in which a decision maker has to take a sequence of decisions subject to $m$ long-term constraints.
The goal is to maximize their total reward, while at the same time achieving small cumulative violation across the $T$ rounds.
We present the first best-of-both-world type algorithm for this general class problems, with no-regret guarantees both in the case in which rewards and constraints are selected according to an unknown model, and in the case in which they are selected at each round by an adversary.
arXiv Detail & Related papers (2022-09-15T16:59:19Z) - Algorithm for Constrained Markov Decision Process with Linear
Convergence [55.41644538483948]
An agent aims to maximize the expected accumulated discounted reward subject to multiple constraints on its costs.
A new dual approach is proposed with the integration of two ingredients: entropy regularized policy and Vaidya's dual.
The proposed approach is shown to converge (with linear rate) to the global optimum.
arXiv Detail & Related papers (2022-06-03T16:26:38Z) - On Kernelized Multi-Armed Bandits with Constraints [16.102401271318012]
We study a bandit problem with a general unknown reward function and a general unknown constraint function.
We propose a general framework for both algorithm performance analysis.
We demonstrate the superior performance of our proposed algorithms via numerical experiments.
arXiv Detail & Related papers (2022-03-29T14:02:03Z) - An Asymptotically Optimal Primal-Dual Incremental Algorithm for
Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions.
We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z) - On Lower Bounds for Standard and Robust Gaussian Process Bandit
Optimization [55.937424268654645]
We consider algorithm-independent lower bounds for the problem of black-box optimization of functions having a bounded norm.
We provide a novel proof technique for deriving lower bounds on the regret, with benefits including simplicity, versatility, and an improved dependence on the error probability.
arXiv Detail & Related papers (2020-08-20T03:48:14Z) - Structure Adaptive Algorithms for Stochastic Bandits [22.871155520200773]
We study reward maximisation in a class of structured multi-armed bandit problems.
The mean rewards of arms satisfy some given structural constraints.
We develop algorithms from instance-dependent lower-bounds using iterative saddle-point solvers.
arXiv Detail & Related papers (2020-07-02T08:59:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.