Related papers: Beyond Softmax: A New Perspective on Gradient Bandits

Beyond Softmax: A New Perspective on Gradient Bandits

URL: http://arxiv.org/abs/2510.03979v1
Date: Sat, 04 Oct 2025 23:43:20 GMT
Title: Beyond Softmax: A New Perspective on Gradient Bandits
Authors: Emerson Melo, David Müller,
Abstract summary: We establish a link between a class of discrete choice models and the theory of online learning and multi-armed bandits.<n>Our contributions are: (i) sublinear regret bounds for a broad algorithmic family, encompassing Exp3 as a special case; (ii) a new class of adversarial bandit algorithms derived from generalized nested logit models citepwen:2001; and (iii) textcolorblackWe introduce a novel class of generalized gradient bandit algorithms that extends beyond the widely used softmax formulation.
Score: 0.8594140167290097
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We establish a link between a class of discrete choice models and the theory of online learning and multi-armed bandits. Our contributions are: (i) sublinear regret bounds for a broad algorithmic family, encompassing Exp3 as a special case; (ii) a new class of adversarial bandit algorithms derived from generalized nested logit models \citep{wen:2001}; and (iii) \textcolor{black}{we introduce a novel class of generalized gradient bandit algorithms that extends beyond the widely used softmax formulation. By relaxing the restrictive independence assumptions inherent in softmax, our framework accommodates correlated learning dynamics across actions, thereby broadening the applicability of gradient bandit methods.} Overall, the proposed algorithms combine flexible model specification with computational efficiency via closed-form sampling probabilities. Numerical experiments in stochastic bandit settings demonstrate their practical effectiveness.

Related papers

Infinite-Dimensional Operator/Block Kaczmarz Algorithms: Regret Bounds and $λ$-Effectiveness [2.1465871746452043]
We study the role of the relaxation parameter in generalized Kaczmarz algorithms.<n>We establish a priori regret bounds with explicit $$-dependence to quantify how much an algorithm's performance deviates from its optimal performance.
arXiv Detail & Related papers (2025-11-10T20:21:32Z)
Single-loop Algorithms for Stochastic Non-convex Optimization with Weakly-Convex Constraints [49.76332265680669]
This paper examines a crucial subset of problems where both the objective and constraint functions are weakly convex.<n>Existing methods often face limitations, including slow convergence rates or reliance on double-loop designs.<n>We introduce a novel single-loop penalty-based algorithm to overcome these challenges.
arXiv Detail & Related papers (2025-04-21T17:15:48Z)
Robust Stochastically-Descending Unrolled Networks [85.6993263983062]
Deep unrolling is an emerging learning-to-optimize method that unrolls a truncated iterative algorithm in the layers of a trainable neural network.<n>We show that convergence guarantees and generalizability of the unrolled networks are still open theoretical problems.<n>We numerically assess unrolled architectures trained under the proposed constraints in two different applications.
arXiv Detail & Related papers (2023-12-25T18:51:23Z)
Discrete Choice Multi-Armed Bandits [0.0]
This paper establishes a connection between a category of discrete choice models and the realms of online learning and multiarmed bandit algorithms. We furnish sublinear regret bounds for a comprehensive family of algorithms, encompassing the Exp3 algorithm as a particular case. We introduce a novel family of adversarial multiarmed bandit algorithms, drawing inspiration from the generalized nested logit models.
arXiv Detail & Related papers (2023-10-01T03:41:04Z)
Sarah Frank-Wolfe: Methods for Constrained Optimization with Best Rates and Practical Features [65.64276393443346]
The Frank-Wolfe (FW) method is a popular approach for solving optimization problems with structured constraints. We present two new variants of the algorithms for minimization of the finite-sum gradient.
arXiv Detail & Related papers (2023-04-23T20:05:09Z)
Linear Bandits with Memory: from Rotting to Rising [5.5969337839476765]
Nonstationary phenomena, such as satiation effects in recommendations, have mostly been modeled using bandits with finitely many arms. We introduce a novel nonstationary linear bandit model, where current rewards are influenced by the learner's past actions in a fixed-size window.
arXiv Detail & Related papers (2023-02-16T15:02:07Z)
Stochastic Rising Bandits [40.32303434592863]
We study a particular case of the rested and restless bandits in which the arms' expected payoff is monotonically non-decreasing. This characteristic allows designing specifically crafted algorithms that exploit the regularity of the payoffs to provide tight regret bounds. We empirically compare our algorithms with state-of-the-art methods for non-stationary MABs over several synthetically generated tasks and an online model selection problem for a real-world dataset.
arXiv Detail & Related papers (2022-12-07T17:30:45Z)
On Kernelized Multi-Armed Bandits with Constraints [16.102401271318012]
We study a bandit problem with a general unknown reward function and a general unknown constraint function. We propose a general framework for both algorithm performance analysis. We demonstrate the superior performance of our proposed algorithms via numerical experiments.
arXiv Detail & Related papers (2022-03-29T14:02:03Z)
Optimal Online Generalized Linear Regression with Stochastic Noise and Its Application to Heteroscedastic Bandits [88.6139446295537]
We study the problem of online generalized linear regression in the setting of a generalized linear model with possibly unbounded additive noise. We provide a sharp analysis of the classical follow-the-regularized-leader (FTRL) algorithm to cope with the label noise. We propose an algorithm based on FTRL to achieve the first variance-aware regret bound.
arXiv Detail & Related papers (2022-02-28T08:25:26Z)
Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve Optimism, Embrace Virtual Curvature [61.22680308681648]
We show that global convergence is statistically intractable even for one-layer neural net bandit with a deterministic reward. For both nonlinear bandit and RL, the paper presents a model-based algorithm, Virtual Ascent with Online Model Learner (ViOL)
arXiv Detail & Related papers (2021-02-08T12:41:56Z)
Structure Adaptive Algorithms for Stochastic Bandits [22.871155520200773]
We study reward maximisation in a class of structured multi-armed bandit problems. The mean rewards of arms satisfy some given structural constraints. We develop algorithms from instance-dependent lower-bounds using iterative saddle-point solvers.
arXiv Detail & Related papers (2020-07-02T08:59:54Z)
Gradient Estimation with Stochastic Softmax Tricks [84.68686389163153]
We introduce softmax tricks, which generalize the Gumbel-Softmax trick to spaces. We find that softmax tricks can be used to train latent variable models that perform better and discover more latent structure.
arXiv Detail & Related papers (2020-06-15T00:43:44Z)
Constrained episodic reinforcement learning in concave-convex and knapsack settings [81.08055425644037]
We provide a modular analysis with strong theoretical guarantees for settings with concave rewards and convex constraints. Our experiments demonstrate that the proposed algorithm significantly outperforms these approaches in existing constrained episodic environments.
arXiv Detail & Related papers (2020-06-09T05:02:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.