Related papers: Blessings of Multiple Good Arms in Multi-Objective Linear Bandits

Blessings of Multiple Good Arms in Multi-Objective Linear Bandits

URL: http://arxiv.org/abs/2602.12901v1
Date: Fri, 13 Feb 2026 13:01:20 GMT
Title: Blessings of Multiple Good Arms in Multi-Objective Linear Bandits
Authors: Heesang Ann, Min-hwan Oh,
Abstract summary: We show that simple algorithms that greedily select actions in most rounds can nonetheless achieve strong performance.<n>We introduce a framework for effective fairness, which provides a principled approach to rigorously analyzing fairness of multi objective bandit algorithms.
Score: 33.46701416812218
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The multi objective bandit setting has traditionally been regarded as more complex than the single objective case, as multiple objectives must be optimized simultaneously. In contrast to this prevailing view, we demonstrate that when multiple good arms exist for multiple objectives, they can induce a surprising benefit, implicit exploration. Under this condition, we show that simple algorithms that greedily select actions in most rounds can nonetheless achieve strong performance, both theoretically and empirically. To our knowledge, this is the first study to introduce implicit exploration in both multi objective and parametric bandit settings without any distributional assumptions on the contexts. We further introduce a framework for effective Pareto fairness, which provides a principled approach to rigorously analyzing fairness of multi objective bandit algorithms.

Related papers

Empirical Bayesian Multi-Bandit Learning [8.980876474818153]
Multi-task learning in contextual bandits has attracted significant research interest.<n>We propose a novel hierarchical Bayesian framework for learning in various bandit instances.<n>We show that our algorithms achieve lower cumulative regret compared to existing techniques.
arXiv Detail & Related papers (2025-10-30T09:08:07Z)
Bounded Rationality for LLMs: Satisficing Alignment at Inference-Time [52.230936493691985]
We propose SITAlign, an inference framework that addresses the multifaceted nature of alignment by maximizing a primary objective while satisfying threshold-based constraints on secondary criteria.<n>We provide theoretical insights by deriving sub-optimality bounds of our satisficing based inference alignment approach.
arXiv Detail & Related papers (2025-05-29T17:56:05Z)
Optimal Multi-Objective Best Arm Identification with Fixed Confidence [62.36929749450298]
We consider a multi-armed bandit setting in which each arm yields an $M$-dimensional vector reward upon selection.<n>The end goal is to identify the best arm of em every objective in the shortest (expected) time subject to an upper bound on the probability of error.<n>We propose an algorithm that uses the novel idea of em surrogate proportions to sample the arms at each time step, eliminating the need to solve the max-min optimisation problem at each step.
arXiv Detail & Related papers (2025-01-23T12:28:09Z)
Multi-task Representation Learning for Pure Exploration in Bilinear Bandits [13.773838574776338]
We study multi-task representation learning for the problem of pure exploration in bilinear bandits. In bilinear bandits, an action takes the form of a pair of arms from two different entity types.
arXiv Detail & Related papers (2023-11-01T06:30:45Z)
Pareto Regret Analyses in Multi-objective Multi-armed Bandit [22.17126026244685]
We study optimality in multi-objective multi-armed bandit. We present new algorithms assuming both with and without prior information of the multi-objective multi-armed bandit setting. The algorithms are shown optimal in adversarial settings and nearly optimal up to a logarithmic factor in settings simultaneously.
arXiv Detail & Related papers (2022-12-01T21:44:27Z)
Synergies between Disentanglement and Sparsity: Generalization and Identifiability in Multi-Task Learning [79.83792914684985]
We prove a new identifiability result that provides conditions under which maximally sparse base-predictors yield disentangled representations. Motivated by this theoretical result, we propose a practical approach to learn disentangled representations based on a sparsity-promoting bi-level optimization problem.
arXiv Detail & Related papers (2022-11-26T21:02:09Z)
Versatile Dueling Bandits: Best-of-both-World Analyses for Online Learning from Preferences [28.79598714109439]
We study the problem of $K$-armed dueling bandit for both and adversarial environments. We first propose a novel reduction from any (general) dueling bandits to multi-armed bandits. Our algorithm is also the first to achieve an optimal $O(sum_i = 1K fraclog TDelta_i)$ regret bound against the Condorcet-winner benchmark.
arXiv Detail & Related papers (2022-02-14T13:37:23Z)
Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit Feedback [51.21673420940346]
Combinatorial bandits generalize multi-armed bandits, where the agent chooses sets of arms and observes a noisy reward for each arm contained in the chosen set. We focus on the pure-exploration problem of identifying the best arm with fixed confidence, as well as a more general setting, where the structure of the answer set differs from the one of the action set. Based on a projection-free online learning algorithm for finite polytopes, it is the first computationally efficient algorithm which is convexally optimal and has competitive empirical performance.
arXiv Detail & Related papers (2021-01-21T10:35:09Z)
DORB: Dynamically Optimizing Multiple Rewards with Bandits [101.68525259222164]
Policy-based reinforcement learning has proven to be a promising approach for optimizing non-differentiable evaluation metrics for language generation tasks. We use the Exp3 algorithm for bandits and formulate two approaches for bandit rewards: (1) Single Multi-reward Bandit (SM-Bandit); (2) Hierarchical Multi-reward Bandit (HM-Bandit) We empirically show the effectiveness of our approaches via various automatic metrics and human evaluation on two important NLG tasks.
arXiv Detail & Related papers (2020-11-15T21:57:47Z)
The Combinatorial Multi-Bandit Problem and its Application to Energy Management [2.236663830879273]
We study a Combinatorial Multi-Bandit Problem motivated by applications in energy systems management. For our energy management application we propose a range of algorithms that combine exploration principles for multi-arm bandits with mathematical programming.
arXiv Detail & Related papers (2020-10-30T13:42:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.