Pareto Regret Analyses in Multi-objective Multi-armed Bandit
- URL: http://arxiv.org/abs/2212.00884v2
- Date: Tue, 30 May 2023 23:02:47 GMT
- Title: Pareto Regret Analyses in Multi-objective Multi-armed Bandit
- Authors: Mengfan Xu, Diego Klabjan
- Abstract summary: We study optimality in multi-objective multi-armed bandit.
We present new algorithms assuming both with and without prior information of the multi-objective multi-armed bandit setting.
The algorithms are shown optimal in adversarial settings and nearly optimal up to a logarithmic factor in settings simultaneously.
- Score: 22.17126026244685
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study Pareto optimality in multi-objective multi-armed bandit by providing
a formulation of adversarial multi-objective multi-armed bandit and defining
its Pareto regrets that can be applied to both stochastic and adversarial
settings. The regrets do not rely on any scalarization functions and reflect
Pareto optimality compared to scalarized regrets. We also present new
algorithms assuming both with and without prior information of the
multi-objective multi-armed bandit setting. The algorithms are shown optimal in
adversarial settings and nearly optimal up to a logarithmic factor in
stochastic settings simultaneously by our established upper bounds and lower
bounds on Pareto regrets. Moreover, the lower bound analyses show that the new
regrets are consistent with the existing Pareto regret for stochastic settings
and extend an adversarial attack mechanism from bandit to the multi-objective
one.
Related papers
- Diverse Text-to-Image Generation via Contrastive Noise Optimization [60.48914865049489]
Text-to-image (T2I) diffusion models have demonstrated impressive performance in generating high-fidelity images.<n>Existing approaches typically optimize intermediate latents or text conditions during inference.<n>We introduce Contrastive Noise Optimization, a simple yet effective method that addresses the diversity issue from a distinct perspective.
arXiv Detail & Related papers (2025-10-04T13:51:32Z) - Stochastic Multi-Objective Multi-Armed Bandits: Regret Definition and Algorithm [6.046591474843391]
Multi-objective multi-armed bandits (MO-MAB) problems are widely applied to online optimization tasks.<n>We propose a novel and comprehensive regret metric that ensures balanced performance across conflicting objectives.
arXiv Detail & Related papers (2025-06-16T06:09:28Z) - Constrained Pareto Set Identification with Bandit Feedback [10.967572582187014]
Given a $K$-armed bandit with unknown means, the goal is to identify the set of arms whose mean is not uniformly worse than that of another arm.<n>Our focus lies in fixed-confidence identification, for which we introduce an algorithm that significantly outperforms racing-like algorithms.
arXiv Detail & Related papers (2025-06-09T18:29:28Z) - Semi-Parametric Batched Global Multi-Armed Bandits with Covariates [0.48342038441006807]
The multi-armed bandits (MAB) framework is a widely used approach for sequential decision-making.
We propose a novel semi-parametric framework for batched bandits with coparametrics and a shared parameter across arms.
Our algorithm, Batched single-Index Dynamic binning and Successive arm elimination (BIDS), employs a batched successive arm elimination strategy.
arXiv Detail & Related papers (2025-03-01T17:23:55Z) - Best Arm Identification with Minimal Regret [55.831935724659175]
Best arm identification problem elegantly amalgamates regret minimization and BAI.
Agent's goal is to identify the best arm with a prescribed confidence level.
Double KL-UCB algorithm achieves optimality as the confidence level tends to zero.
arXiv Detail & Related papers (2024-09-27T16:46:02Z) - UCB-driven Utility Function Search for Multi-objective Reinforcement Learning [75.11267478778295]
In Multi-objective Reinforcement Learning (MORL) agents are tasked with optimising decision-making behaviours.
We focus on the case of linear utility functions parameterised by weight vectors w.
We introduce a method based on Upper Confidence Bound to efficiently search for the most promising weight vectors during different stages of the learning process.
arXiv Detail & Related papers (2024-05-01T09:34:42Z) - Multi-Armed Bandits with Abstention [62.749500564313834]
We introduce a novel extension of the canonical multi-armed bandit problem that incorporates an additional strategic element: abstention.
In this enhanced framework, the agent is not only tasked with selecting an arm at each time step, but also has the option to abstain from accepting the instantaneous reward before observing it.
arXiv Detail & Related papers (2024-02-23T06:27:12Z) - Adaptive Algorithms for Relaxed Pareto Set Identification [12.326452468513228]
We revisit the fixed-confidence identification of the Pareto optimal set in a multi-objective multi-armed bandit model.
We propose a single sampling strategy, called Adaptive Pareto Exploration, that can be used in conjunction with different stopping rules.
We analyze the sample complexity of these different combinations, quantifying in particular the reduction in sample complexity.
arXiv Detail & Related papers (2023-07-01T20:43:12Z) - Learning the Pareto Front Using Bootstrapped Observation Samples [17.519167857253404]
We propose an algorithm to identify a set of arms with undominated mean reward vectors.
The sample complexity of our proposed algorithm is optimal up to a logarithmic factor.
Key contribution is a new estimator that in every round updates the estimate for the unknown parameter along multiple context directions.
arXiv Detail & Related papers (2023-05-31T18:15:09Z) - A Unifying Perspective on Multi-Calibration: Game Dynamics for
Multi-Objective Learning [63.20009081099896]
We provide a unifying framework for the design and analysis of multicalibrated predictors.
We exploit connections to game dynamics to achieve state-of-the-art guarantees for a diverse set of multicalibration learning problems.
arXiv Detail & Related papers (2023-02-21T18:24:17Z) - Slowly Changing Adversarial Bandit Algorithms are Efficient for
Discounted MDPs [10.68896183306706]
Reinforcement learning generalizes multi-armed bandit problems with additional difficulties of a longer planning horizon and unknown transition kernel.
We show that any slowly changing adversarial bandit algorithm achieving optimal regret in the adversarial bandit setting can also attain optimal expected regret in infinite-horizon discounted Markov decision processes.
arXiv Detail & Related papers (2022-05-18T16:40:30Z) - Versatile Dueling Bandits: Best-of-both-World Analyses for Online
Learning from Preferences [28.79598714109439]
We study the problem of $K$-armed dueling bandit for both and adversarial environments.
We first propose a novel reduction from any (general) dueling bandits to multi-armed bandits.
Our algorithm is also the first to achieve an optimal $O(sum_i = 1K fraclog TDelta_i)$ regret bound against the Condorcet-winner benchmark.
arXiv Detail & Related papers (2022-02-14T13:37:23Z) - Max-Min Grouped Bandits [48.62520520818357]
We introduce a multi-armed bandit problem termed max-min grouped bandits.
The goal is to find a group whose worst arm has the highest mean reward.
This problem is of interest to applications such as recommendation systems.
arXiv Detail & Related papers (2021-11-17T01:59:15Z) - Bias-Robust Bayesian Optimization via Dueling Bandit [57.82422045437126]
We consider Bayesian optimization in settings where observations can be adversarially biased.
We propose a novel approach for dueling bandits based on information-directed sampling (IDS)
Thereby, we obtain the first efficient kernelized algorithm for dueling bandits that comes with cumulative regret guarantees.
arXiv Detail & Related papers (2021-05-25T10:08:41Z) - Adaptive Discretization for Adversarial Lipschitz Bandits [85.39106976861702]
Lipschitz bandits is a prominent version of multi-armed bandits that studies large, structured action spaces.
A central theme here is the adaptive discretization of the action space, which gradually zooms in'' on the more promising regions.
We provide the first algorithm for adaptive discretization in the adversarial version, and derive instance-dependent regret bounds.
arXiv Detail & Related papers (2020-06-22T16:06:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.