Finite-Time Guarantees for Multi-Agent Combinatorial Bandits with Nonstationary Rewards
- URL: http://arxiv.org/abs/2508.20923v1
- Date: Thu, 28 Aug 2025 15:51:57 GMT
- Title: Finite-Time Guarantees for Multi-Agent Combinatorial Bandits with Nonstationary Rewards
- Authors: Katherine B. Adams, Justin J. Boutilier, Qinyang He, Yonatan Mintz,
- Abstract summary: We study a sequential resource allocation problem where a decision maker selects subsets of agents at each period to maximize overall outcomes without prior knowledge of individual-level effects.<n>Our framework applies to settings such as community health interventions, targeted digital advertising, and workforce retention programs.
- Score: 0.8166364251367625
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study a sequential resource allocation problem where a decision maker selects subsets of agents at each period to maximize overall outcomes without prior knowledge of individual-level effects. Our framework applies to settings such as community health interventions, targeted digital advertising, and workforce retention programs, where intervention effects evolve dynamically. Agents may exhibit habituation (diminished response from frequent selection) or recovery (enhanced response from infrequent selection). The technical challenge centers on nonstationary reward distributions that lead to changing intervention effects over time. The problem requires balancing two key competing objectives: heterogeneous individual rewards and the exploration-exploitation tradeoff in terms of learning for improved future decisions as opposed to maximizing immediate outcomes. Our contribution introduces the first framework incorporating this form of nonstationary rewards in the combinatorial multi-armed bandit literature. We develop algorithms with theoretical guarantees on dynamic regret and demonstrate practical efficacy through a diabetes intervention case study. Our personalized community intervention algorithm achieved up to three times as much improvement in program enrollment compared to baseline approaches, validating the framework's potential for real-world applications. This work bridges theoretical advances in adaptive learning with practical challenges in population-level behavioral change interventions.
Related papers
- Decoding Rewards in Competitive Games: Inverse Game Theory with Entropy Regularization [52.74762030521324]
We propose a novel algorithm to learn reward functions from observed actions.<n>We provide strong theoretical guarantees for the reliability and sample efficiency of our algorithm.
arXiv Detail & Related papers (2026-01-19T04:12:51Z) - Power Constrained Nonstationary Bandits with Habituation and Recovery Dynamics [0.9699640804685629]
This paper develops a Thompson Sampling algorithm tailored to the ROGUE framework.<n>We then introduce a probability clipping procedure to balance personalization and population-level learning.<n>For researchers designing micro-randomized trials, our framework offers practical guidance on balancing personalization with statistical validity.
arXiv Detail & Related papers (2025-11-04T19:46:42Z) - Towards Principled Unsupervised Multi-Agent Reinforcement Learning [49.533774397707056]
We present a scalable, decentralized, trust-region policy search algorithm to address the problem in practical settings.<n>We show that optimizing for a specific objective, namely mixture entropy, provides an excellent trade-off between tractability and performances.
arXiv Detail & Related papers (2025-02-12T12:51:36Z) - Preference-Based Multi-Agent Reinforcement Learning: Data Coverage and Algorithmic Techniques [65.55451717632317]
We study Preference-Based Multi-Agent Reinforcement Learning (PbMARL)<n>We identify the Nash equilibrium from a preference-only offline dataset in general-sum games.<n>Our findings underscore the multifaceted approach required for PbMARL.
arXiv Detail & Related papers (2024-09-01T13:14:41Z) - Reduced-Rank Multi-objective Policy Learning and Optimization [57.978477569678844]
In practice, causal researchers do not have a single outcome in mind a priori.
In government-assisted social benefit programs, policymakers collect many outcomes to understand the multidimensional nature of poverty.
We present a data-driven dimensionality-reduction methodology for multiple outcomes in the context of optimal policy learning.
arXiv Detail & Related papers (2024-04-29T08:16:30Z) - Setting the Right Expectations: Algorithmic Recourse Over Time [16.930905275894183]
We propose an agent-based simulation framework for studying the effects of a continuously changing environment on algorithmic recourse.
Our findings highlight that only a small set of specific parameterizations result in algorithmic recourse that is reliable for agents over time.
arXiv Detail & Related papers (2023-09-13T14:04:15Z) - Bandit approach to conflict-free multi-agent Q-learning in view of
photonic implementation [0.0]
Previous studies have used quantum interference of photons to solve the competitive multi-armed bandit problem.
This study extends the conventional approach to a more general multi-agent reinforcement learning.
A successful photonic reinforcement learning scheme requires both a photonic system that contributes to the quality of learning and a suitable algorithm.
arXiv Detail & Related papers (2022-12-20T00:27:29Z) - Influencing Long-Term Behavior in Multiagent Reinforcement Learning [59.98329270954098]
We propose a principled framework for considering the limiting policies of other agents as the time approaches infinity.
Specifically, we develop a new optimization objective that maximizes each agent's average reward by directly accounting for the impact of its behavior on the limiting set of policies that other agents will take on.
Thanks to our farsighted evaluation, we demonstrate better long-term performance than state-of-the-art baselines in various domains.
arXiv Detail & Related papers (2022-03-07T17:32:35Z) - Contingency-Aware Influence Maximization: A Reinforcement Learning
Approach [52.109536198330126]
influence (IM) problem aims at finding a subset of seed nodes in a social network that maximize the spread of influence.
In this study, we focus on a sub-class of IM problems, where whether the nodes are willing to be the seeds when being invited is uncertain, called contingency-aware IM.
Despite the initial success, a major practical obstacle in promoting the solutions to more communities is the tremendous runtime of the greedy algorithms.
arXiv Detail & Related papers (2021-06-13T16:42:22Z) - End-to-End Learning and Intervention in Games [60.41921763076017]
We provide a unified framework for learning and intervention in games.
We propose two approaches, respectively based on explicit and implicit differentiation.
The analytical results are validated using several real-world problems.
arXiv Detail & Related papers (2020-10-26T18:39:32Z) - Unified Models of Human Behavioral Agents in Bandits, Contextual Bandits
and RL [28.38826379640553]
We propose a more general and flexible parametric framework for sequential decision making.
Inspired by the known reward processing abnormalities of many mental disorders, our clinically-inspired agents demonstrated interesting behavioral trajectories.
arXiv Detail & Related papers (2020-05-10T01:43:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.