Related papers: Design-Based Bandits Under Network Interference: Trade-Off Between Regret and Statistical Inference

Design-Based Bandits Under Network Interference: Trade-Off Between Regret and Statistical Inference

URL: http://arxiv.org/abs/2510.07646v2
Date: Fri, 10 Oct 2025 19:23:11 GMT
Title: Design-Based Bandits Under Network Interference: Trade-Off Between Regret and Statistical Inference
Authors: Zichen Wang, Haoyang Hong, Chuanhao Li, Haoxuan Li, Zhiheng Zhang, Huazheng Wang,
Abstract summary: In multi-armed bandits with network interference (MABNI), the action taken by one node can influence the rewards of others, creating complex interdependence.<n>We introduce an anytime-valid confidence sequence along with a corresponding algorithm to balance the trade-off between regret minimization and inference accuracy.
Score: 41.49815326663467
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In multi-armed bandits with network interference (MABNI), the action taken by one node can influence the rewards of others, creating complex interdependence. While existing research on MABNI largely concentrates on minimizing regret, it often overlooks the crucial concern that an excessive emphasis on the optimal arm can undermine the inference accuracy for sub-optimal arms. Although initial efforts have been made to address this trade-off in single-unit scenarios, these challenges have become more pronounced in the context of MABNI. In this paper, we establish, for the first time, a theoretical Pareto frontier characterizing the trade-off between regret minimization and inference accuracy in adversarial (design-based) MABNI. We further introduce an anytime-valid asymptotic confidence sequence along with a corresponding algorithm, $\texttt{EXP3-N-CS}$, specifically designed to balance the trade-off between regret minimization and inference accuracy in this setting.

Related papers

Stochastic Multi-Objective Multi-Armed Bandits: Regret Definition and Algorithm [6.046591474843391]
Multi-objective multi-armed bandits (MO-MAB) problems are widely applied to online optimization tasks.<n>We propose a novel and comprehensive regret metric that ensures balanced performance across conflicting objectives.
arXiv Detail & Related papers (2025-06-16T06:09:28Z)
Influential Bandits: Pulling an Arm May Change the Environment [44.71145269686588]
Real-world applications often involve non-stationary environments and interdependencies between arms.<n>We propose the influential bandit problem, which models inter-arm interactions through an unknown, symmetric, positive semi-definite interaction matrix.<n>We introduce a new algorithm based on a lower confidence bound (LCB) estimator tailored to the structure of the loss dynamics.
arXiv Detail & Related papers (2025-04-11T02:05:51Z)
Continuous K-Max Bandits [54.21533414838677]
We study the $K$-Max multi-armed bandits problem with continuous outcome distributions and weak value-index feedback.<n>This setting captures critical applications in recommendation systems, distributed computing, server scheduling, etc.<n>Our key contribution is the computationally efficient algorithm DCK-UCB, which combines adaptive discretization with bias-corrected confidence bounds.
arXiv Detail & Related papers (2025-02-19T06:37:37Z)
Best Arm Identification with Minimal Regret [55.831935724659175]
Best arm identification problem elegantly amalgamates regret minimization and BAI. Agent's goal is to identify the best arm with a prescribed confidence level. Double KL-UCB algorithm achieves optimality as the confidence level tends to zero.
arXiv Detail & Related papers (2024-09-27T16:46:02Z)
Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms. In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z)
IBP Regularization for Verified Adversarial Robustness via Branch-and-Bound [85.6899802468343]
We present IBP-R, a novel verified training algorithm that is both simple effective. We also present UPB, a novel robustness based on $beta$-CROWN, that reduces the cost state-of-the-art branching algorithms.
arXiv Detail & Related papers (2022-06-29T17:13:25Z)
Linear Stochastic Bandits over a Bit-Constrained Channel [37.01818450308119]
We introduce a new linear bandit formulation over a bit-constrained channel. The goal of the server is to take actions based on estimates of an unknown model parameter to minimize cumulative regret. We prove that when the unknown model is $d$-dimensional, a channel capacity of $O(d)$ bits suffices to achieve order-optimal regret.
arXiv Detail & Related papers (2022-03-02T15:54:03Z)
Achieving the Pareto Frontier of Regret Minimization and Best Arm Identification in Multi-Armed Bandits [91.8283876874947]
We design and analyze the BoBW-lil'UCB$(gamma)$ algorithm. We show that (i) no algorithm can simultaneously perform optimally for both the RM and BAI objectives. We also show that BoBW-lil'UCB$(gamma)$ outperforms a competitor in terms of the time complexity and the regret.
arXiv Detail & Related papers (2021-10-16T17:52:32Z)
Constrained regret minimization for multi-criterion multi-armed bandits [5.349852254138086]
We study the problem of regret minimization over a given time horizon, subject to a risk constraint. We propose a Risk-Constrained Lower Confidence Bound algorithm that guarantees logarithmic regret. We prove lower bounds on the performance of any risk-constrained regret minimization algorithm.
arXiv Detail & Related papers (2020-06-17T04:23:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.