Bandits in Flux: Adversarial Constraints in Dynamic Environments
- URL: http://arxiv.org/abs/2601.19867v1
- Date: Tue, 27 Jan 2026 18:26:07 GMT
- Title: Bandits in Flux: Adversarial Constraints in Dynamic Environments
- Authors: Tareq Si Salem,
- Abstract summary: We propose a primal-dual algorithm that extends online mirror descent through the incorporation of suitable gradient estimators and effective constraint handling.<n>Our algorithm achieves state-of-the-art performance in terms of both regret and constraint violation.
- Score: 2.368995563245609
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We investigate the challenging problem of adversarial multi-armed bandits operating under time-varying constraints, a scenario motivated by numerous real-world applications. To address this complex setting, we propose a novel primal-dual algorithm that extends online mirror descent through the incorporation of suitable gradient estimators and effective constraint handling. We provide theoretical guarantees establishing sublinear dynamic regret and sublinear constraint violation for our proposed policy. Our algorithm achieves state-of-the-art performance in terms of both regret and constraint violation. Empirical evaluations demonstrate the superiority of our approach.
Related papers
- Adaptive Neighborhood-Constrained Q Learning for Offline Reinforcement Learning [52.03884701766989]
offline reinforcement learning (RL) algorithms typically impose constraints on action selection.<n>We propose a new neighborhood constraint that restricts action selection in the Bellman target to the union of neighborhoods of dataset actions.<n>We develop a simple yet effective algorithm, Adaptive Neighborhood-constrained Q learning (ANQ), to perform Q learning with target actions satisfying this constraint.
arXiv Detail & Related papers (2025-11-04T13:42:05Z) - Incentivizing Safer Actions in Policy Optimization for Constrained Reinforcement Learning [9.62939764063531]
Constrained Reinforcement Learning aims to maximize the return while adhering to predefined constraint limits.<n>In continuous control settings, balancing the trade-off between reward and constraint satisfaction remains a significant challenge.<n>We introduce a novel approach that integrates an adaptive incentive mechanism in addition to the reward structure to stay within the constraint bound.
arXiv Detail & Related papers (2025-09-11T07:33:35Z) - Single-loop Algorithms for Stochastic Non-convex Optimization with Weakly-Convex Constraints [49.76332265680669]
This paper examines a crucial subset of problems where both the objective and constraint functions are weakly convex.<n>Existing methods often face limitations, including slow convergence rates or reliance on double-loop designs.<n>We introduce a novel single-loop penalty-based algorithm to overcome these challenges.
arXiv Detail & Related papers (2025-04-21T17:15:48Z) - Resilient Constrained Reinforcement Learning [87.4374430686956]
We study a class of constrained reinforcement learning (RL) problems in which multiple constraint specifications are not identified before study.
It is challenging to identify appropriate constraint specifications due to the undefined trade-off between the reward training objective and the constraint satisfaction.
We propose a new constrained RL approach that searches for policy and constraint specifications together.
arXiv Detail & Related papers (2023-12-28T18:28:23Z) - Robust Stochastically-Descending Unrolled Networks [85.6993263983062]
Deep unrolling is an emerging learning-to-optimize method that unrolls a truncated iterative algorithm in the layers of a trainable neural network.<n>We show that convergence guarantees and generalizability of the unrolled networks are still open theoretical problems.<n>We numerically assess unrolled architectures trained under the proposed constraints in two different applications.
arXiv Detail & Related papers (2023-12-25T18:51:23Z) - The Impact of the Geometric Properties of the Constraint Set in Safe
Optimization with Bandit Feedback [5.758073912084366]
We consider a safe optimization problem with bandit feedback in which an agent sequentially chooses actions and observes responses from the environment.
We propose an algorithm for this problem, and study how the geometric properties of the constraint set impact the regret of the algorithm.
arXiv Detail & Related papers (2023-05-01T15:48:34Z) - Online Nonstochastic Control with Adversarial and Static Constraints [12.2632894803286]
We propose online nonstochastic control algorithms that achieve both sublinear regret and sublinear adversarial constraint violation.
Our algorithms are adaptive to adversarial constraints and achieve smaller cumulative costs and violations.
arXiv Detail & Related papers (2023-02-05T16:46:12Z) - Penalized Proximal Policy Optimization for Safe Reinforcement Learning [68.86485583981866]
We propose Penalized Proximal Policy Optimization (P3O), which solves the cumbersome constrained policy iteration via a single minimization of an equivalent unconstrained problem.
P3O utilizes a simple-yet-effective penalty function to eliminate cost constraints and removes the trust-region constraint by the clipped surrogate objective.
We show that P3O outperforms state-of-the-art algorithms with respect to both reward improvement and constraint satisfaction on a set of constrained locomotive tasks.
arXiv Detail & Related papers (2022-05-24T06:15:51Z) - Escaping from Zero Gradient: Revisiting Action-Constrained Reinforcement
Learning via Frank-Wolfe Policy Optimization [5.072893872296332]
Action-constrained reinforcement learning (RL) is a widely-used approach in various real-world applications.
We propose a learning algorithm that decouples the action constraints from the policy parameter update.
We show that the proposed algorithm significantly outperforms the benchmark methods on a variety of control tasks.
arXiv Detail & Related papers (2021-02-22T14:28:03Z) - Constrained episodic reinforcement learning in concave-convex and
knapsack settings [81.08055425644037]
We provide a modular analysis with strong theoretical guarantees for settings with concave rewards and convex constraints.
Our experiments demonstrate that the proposed algorithm significantly outperforms these approaches in existing constrained episodic environments.
arXiv Detail & Related papers (2020-06-09T05:02:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.