Related papers: Improved Algorithms for Nash Welfare in Linear Bandits

Improved Algorithms for Nash Welfare in Linear Bandits

URL: http://arxiv.org/abs/2601.22969v1
Date: Fri, 30 Jan 2026 13:32:37 GMT
Title: Improved Algorithms for Nash Welfare in Linear Bandits
Authors: Dhruv Sarkar, Nishant Pandey, Sayak Ray Chowdhury,
Abstract summary: We introduce new analytical tools that yield an order-optimal Nash regret bound in linear bandits.<n>We propose a generic algorithmic framework, FairLinBandit, that works as a meta-algorithm on top of any linear bandit strategy.
Score: 9.443816944974419
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Nash regret has recently emerged as a principled fairness-aware performance metric for stochastic multi-armed bandits, motivated by the Nash Social Welfare objective. Although this notion has been extended to linear bandits, existing results suffer from suboptimality in ambient dimension $d$, stemming from proof techniques that rely on restrictive concentration inequalities. In this work, we resolve this open problem by introducing new analytical tools that yield an order-optimal Nash regret bound in linear bandits. Beyond Nash regret, we initiate the study of $p$-means regret in linear bandits, a unifying framework that interpolates between fairness and utility objectives and strictly generalizes Nash regret. We propose a generic algorithmic framework, FairLinBandit, that works as a meta-algorithm on top of any linear bandit strategy. We instantiate this framework using two bandit algorithms: Phased Elimination and Upper Confidence Bound, and prove that both achieve sublinear $p$-means regret for the entire range of $p$. Extensive experiments on linear bandit instances generated from real-world datasets demonstrate that our methods consistently outperform the existing state-of-the-art baseline.

Related papers

Optimal and Practical Batched Linear Bandit Algorithm [8.087699764574788]
We study the linear bandit problem under limited adaptivity, known as the batched linear bandit.<n>We propose BLAE, a novel batched algorithm that integrates arm elimination with regularized G-optimal design.<n> BLAE is the first algorithm to combine provable minimax-optimality in all regimes and practical superiority in batched linear bandits.
arXiv Detail & Related papers (2025-07-11T09:29:28Z)
Experimental Design for Semiparametric Bandits [11.156009461711639]
We study finite-armed semiparametric bandits, where each arm's reward combines a linear component with an unknown, potentially adversarial shift.<n>We propose the first experimental-design approach that simultaneously offers a sharp regret bound, a PAC bound, and a best-arm identification guarantee.
arXiv Detail & Related papers (2025-06-16T11:53:00Z)
Indexed Minimum Empirical Divergence-Based Algorithms for Linear Bandits [55.938644481736446]
Indexed Minimum Empirical Divergence (IMED) is a highly effective approach to the multi-armed bandit problem. It has been observed to empirically outperform UCB-based algorithms and Thompson Sampling. We present novel linear versions of the IMED algorithm, which we call the family of LinIMED algorithms.
arXiv Detail & Related papers (2024-05-24T04:11:58Z)
Variance-Dependent Regret Bounds for Non-stationary Linear Bandits [52.872628573907434]
We propose algorithms that utilize the variance of the reward distribution as well as the $B_K$, and show that they can achieve tighter regret upper bounds. We introduce two novel algorithms: Restarted Weighted$textOFUL+$ and Restarted $textSAVE+$. Notably, when the total variance $V_K$ is much smaller than $K$, our algorithms outperform previous state-of-the-art results on non-stationary linear bandits under different settings.
arXiv Detail & Related papers (2024-03-15T23:36:55Z)
Nash Regret Guarantees for Linear Bandits [12.494184403263338]
We develop an algorithm that achieves a Nash regret of $Oleft( sqrtfracdnuT log( T |X|)right)$. In addition, addressing linear bandit instances in which the set of arms $X$ is not necessarily finite, we obtain a Nash regret upper bound $Oleft( fracdfrac54nufrac12sqrtT log(T)right)$.
arXiv Detail & Related papers (2023-10-03T12:58:10Z)
PopArt: Efficient Sparse Regression and Experimental Design for Optimal Sparse Linear Bandits [29.097522376094624]
We propose a simple and computationally efficient sparse linear estimation method called PopArt. We derive sparse linear bandit algorithms that enjoy improved regret upper bounds upon the state of the art.
arXiv Detail & Related papers (2022-10-25T19:13:20Z)
An $\alpha$-No-Regret Algorithm For Graphical Bilinear Bandits [15.29268368415036]
We propose the first regret-based approach to the Graphical Bilinear Bandits problem. We present the first regret-based algorithm for bilinear bandits using the principle of optimism in the face of uncertainty.
arXiv Detail & Related papers (2022-06-01T12:55:17Z)
Fairness and Welfare Quantification for Regret in Multi-Armed Bandits [13.094997642327371]
We extend the notion of regret with a welfarist perspective. We study Nash regret, defined as the difference between the -- a priori unknown -- optimal mean (among the arms) and the algorithm's performance.
arXiv Detail & Related papers (2022-05-27T12:12:56Z)
Optimal Online Generalized Linear Regression with Stochastic Noise and Its Application to Heteroscedastic Bandits [88.6139446295537]
We study the problem of online generalized linear regression in the setting of a generalized linear model with possibly unbounded additive noise. We provide a sharp analysis of the classical follow-the-regularized-leader (FTRL) algorithm to cope with the label noise. We propose an algorithm based on FTRL to achieve the first variance-aware regret bound.
arXiv Detail & Related papers (2022-02-28T08:25:26Z)
Learning Contextual Bandits Through Perturbed Rewards [107.6210145983805]
We show that a $tildeO(tildedsqrtT)$ regret upper bound is still achievable under standard regularity conditions. We perturb the rewards when updating the neural network to eliminate the need of explicit exploration.
arXiv Detail & Related papers (2022-01-24T19:10:22Z)
Optimal Gradient-based Algorithms for Non-concave Bandit Optimization [76.57464214864756]
This work considers a large family of bandit problems where the unknown underlying reward function is non-concave. Our algorithms are based on a unified zeroth-order optimization paradigm that applies in great generality. We show that the standard optimistic algorithms are sub-optimal by dimension factors.
arXiv Detail & Related papers (2021-07-09T16:04:24Z)
Impact of Representation Learning in Linear Bandits [83.17684841392754]
We study how representation learning can improve the efficiency of bandit problems. We present a new algorithm which achieves $widetildeO(TsqrtkN + sqrtdkNT)$ regret, where $N$ is the number of rounds we play for each bandit.
arXiv Detail & Related papers (2020-10-13T16:35:30Z)
Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies. We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.