Fugu-MT 論文翻訳(概要): Tree-Guided Identify-Then-Exploit: A Unified Framework of Best Arm Identification and Regret Minimization for Dueling Bandits

論文の概要: Tree-Guided Identify-Then-Exploit: A Unified Framework of Best Arm Identification and Regret Minimization for Dueling Bandits

arxiv url: http://arxiv.org/abs/2606.01799v1
Date: Mon, 01 Jun 2026 07:17:49 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-02 21:34:31.485165
Title: Tree-Guided Identify-Then-Exploit: A Unified Framework of Best Arm Identification and Regret Minimization for Dueling Bandits
Title（参考訳）: ツリーガイドによる識別-Then-Exploit:Dueling Banditsのためのベストアーム識別とレグレット最小化の統一フレームワーク
Authors: Pu Wang, Yao-Xiang Ding,
Abstract要約: 我々はコンドルチェット・ウィンナーの仮定で$N$の武器を持つデュエルバンドについて研究する。広く採用されている3つの目的は、ベストアーム識別(BAI)、弱い後悔、強い後悔である。我々は,これらすべての目的に対処する最初の統合フレームワークであるTG-ITE(Tree-Guided Identify-Then-Exploit)を提案する。
参考スコア（独自算出の注目度）: 15.350660480734424
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study $N$-armed stochastic dueling bandits under the Condorcet-winner assumption, where three widely adopted objectives are considered: best-arm identification (BAI), weak regret, and strong regret. We propose Tree-Guided Identify-Then-Exploit (TG-ITE), the first unified framework to tackle all these objectives to our knowledge. Without requiring stronger assumptions, we propose a shared tree-guided identification approach to find a high-confidence incumbent within $O(N)$ comparisons. We further propose varied exploitation strategies to utilize this warm-start stage to optimize the specific objectives at hand. This methodology enables our approach to (1) achieve $O(N)$ sample complexity in BAI without commonly adopted stronger assumptions; (2) build the first winner-stays-style algorithm to achieve $O(N)$ weak regret; (3) enjoy the same $O(N \log T)$ guarantee as specialized strong-regret approaches; (4) realize the joint optimization of BAI and weak regret with $O(N)$ guarantees for both, eliminating the sub-optimal gap of $O(\log N)$ in the existing approach. Our results provide evidence that the trade-off between BAI and regret minimization is relatively benign in dueling bandits.
Abstract（参考訳）: コンドルチェット・ウィンナーの仮定では, ベストアーム識別(BAI), 弱い後悔, 強い後悔の3つの目的が広く採用されている。我々は,これらすべての目的に対処する最初の統合フレームワークであるTG-ITE(Tree-Guided Identify-Then-Exploit)を提案する。より強い仮定を必要とせず、我々は、$O(N)$比較の中で高信頼な既存元を見つけるための共有木誘導識別手法を提案する。我々はまた、このウォームスタートステージを利用して、目前にある特定の目的を最適化するための様々な搾取戦略を提案する。本手法は,(1)より強い仮定を伴わないBAIにおいて,(1)$O(N)$サンプル複雑性を実現すること,(2)$O(N)$弱い後悔を達成するために,最初の勝者-ステイズスタイルのアルゴリズムを構築すること,(3)特別な強欲的アプローチとして,同じ$O(N \log T)$保証を享受すること,(4)既存のアプローチにおける$O(N)$のサブ最適ギャップを排除して,BAIの共同最適化と弱い後悔を実現すること,の2つの方法を実現する。以上の結果から,BAIと後悔の最小化とのトレードオフが,盗賊のデュエルにおいて比較的良質であることを示す。

論文の概要: Tree-Guided Identify-Then-Exploit: A Unified Framework of Best Arm Identification and Regret Minimization for Dueling Bandits

関連論文リスト