Fugu-MT 論文翻訳(概要): Are Stochastic Multi-objective Bandits Harder than Single-objective Bandits?

論文の概要: Are Stochastic Multi-objective Bandits Harder than Single-objective Bandits?

arxiv url: http://arxiv.org/abs/2604.07096v1
Date: Wed, 08 Apr 2026 13:48:54 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-09 17:30:51.563677
Title: Are Stochastic Multi-objective Bandits Harder than Single-objective Bandits?
Title（参考訳）: 確率的多目的帯域は単目的帯域よりも硬いか?
Authors: Changkun Guan, Mengfan Xu,
Abstract要約: 多目的包帯は、各腕の報酬がスカラーではなく多次元ベクトルである広い適用性とエレガンスのため、注目を集めている。この領域における長年の疑問は、この複雑さが加わったため、パフォーマンスを最適化するのが困難であるかどうかである。我々は注文の最小限の後悔を達成する新しいアルゴリズム (fracKlog Tgdagger) を開発する。
参考スコア（独自算出の注目度）: 4.452583096196013
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multi-objective bandits have attracted increasing attention because of their broad applicability and mathematical elegance, where the reward of each arm is a multi-dimensional vector rather than a scalar. This naturally introduces Pareto order relations and Pareto regret. A long-standing question in this area is whether performance is fundamentally harder to optimize because of this added complexity. A recent surprising result shows that, in the adversarial setting, Pareto regret is no larger than classical regret; however, in the stochastic setting, where the regret notion is different, the picture remains unclear. In fact, existing work suggests that Pareto regret in the stochastic case increases with the dimensionality. This controversial yet subtle phenomenon motivates our central question: \emph{are multi-objective bandits actually harder than single-objective ones?} We answer this question in full by showing that, in the stochastic setting, Pareto regret is in fact governed by the maximum sub-optimality gap \(g^\dagger\), and hence by the minimum marginal regret of order \(Ω(\frac{K\log T}{g^\dagger})\). We further develop a new algorithm that achieves Pareto regret of order \(O(\frac{K\log T}{g^\dagger})\), and is therefore optimal. The algorithm leverages a nested two-layer uncertainty quantification over both arms and objectives through upper and lower confidence bound estimators. It combines a top-two racing strategy for arm selection with an uncertainty-greedy rule for dimension selection. Together, these components balance exploration and exploitation across the two layers. We also conduct comprehensive numerical experiments to validate the proposed algorithm, showing the desired regret guarantee and significant gains over benchmark methods.
Abstract（参考訳）: 多目的の包帯は、それぞれの腕の報酬がスカラーではなく多次元ベクトルである広い適用性と数学的エレガンスのために、注目を集めている。このことはパレートの秩序関係とパレートの後悔を自然に引き起こす。この領域における長年の疑問は、この複雑さが加わったため、パフォーマンスが基本的に最適化が困難であるかどうかである。最近の驚くべき結果は、逆境では、パレートの後悔は古典的な後悔ほど大きくはないことを示しているが、後悔の概念が異なる確率的な状況では、その絵はいまだに不明瞭である。実際、既存の研究はパレートが確率的ケースで後悔すると次元が増加することを示唆している。この議論の的となっているが微妙な現象は、我々の中心的な疑問を動機付けている: \emph{are multi-objective banditsは、実際には単目的のものよりも難しいのか? 確率的な設定では、パレートの後悔は、実際には最大部分最適性ギャップ \(g^\dagger\) によって支配され、従って位数 \(Ω(\frac{K\log T}{g^\dagger})\) の最小限の後悔によって支配される。さらに,秩序のパレートを後悔するアルゴリズム(O(\frac{K\log T}{g^\dagger})\)を開発し,最適である。このアルゴリズムは、上下の信頼境界推定器を通して、両腕と目的のネストされた2層不確実性定量化を利用する。アームセレクションのためのトップ2のレース戦略とディメンションセレクションのための不確実な規則を組み合わせる。これらのコンポーネントは2つのレイヤ間の探索とエクスプロイトのバランスをとる。また,提案アルゴリズムの有効性を検証するための総合的な数値実験を行い,ベンチマーク手法よりも望まれる後悔の保証と有意な利得を示す。

論文の概要: Are Stochastic Multi-objective Bandits Harder than Single-objective Bandits?

関連論文リスト