Fugu-MT 論文翻訳(概要): When Exploration Comes for Free with Mixture-Greedy: Do we need UCB in Diversity-Aware Multi-Armed Bandits?

論文の概要: When Exploration Comes for Free with Mixture-Greedy: Do we need UCB in Diversity-Aware Multi-Armed Bandits?

arxiv url: http://arxiv.org/abs/2603.21716v1
Date: Mon, 23 Mar 2026 08:59:45 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-24 19:11:39.577545
Title: When Exploration Comes for Free with Mixture-Greedy: Do we need UCB in Diversity-Aware Multi-Armed Bandits?
Title（参考訳）: Mixture-Greedyによる無料探査:多様性に配慮したマルチアーマッドバンドに UCB は必要か?
Authors: Bahar Dibaei Nia, Farzan Farnia,
Abstract要約: 明示的な UCB 型最適化を伴わない簡単な emphMixture-Greedy 戦略はより高速に収束し,より優れた性能が得られることを示す。透明な構造条件下では、多様性を意識した目的は内部混合物を好んで暗黙的な探索を引き起こす。これらの結果は,多様性を意識したモデル選択のためのマルチアーム帯状帯状帯状帯状帯状帯状帯状帯状帯状帯状帯状帯状帯状帯状帯状帯状帯状帯状帯状帯状帯状帯状帯状帯状帯状帯状帯状帯状帯状帯状帯状帯状帯状帯状帯状帯状帯状帯状帯状帯状帯状帯状帯状帯状帯状帯状帯状帯状
参考スコア（独自算出の注目度）: 18.528635656824864
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Efficient selection among multiple generative models is increasingly important in modern generative AI, where sampling from suboptimal models is costly. This problem can be formulated as a multi-armed bandit task. Under diversity-aware evaluation metrics, a non-degenerate mixture of generators can outperform any individual model, distinguishing this setting from classical best-arm identification. Prior approaches therefore incorporate an Upper Confidence Bound (UCB) exploration bonus into the mixture objective. However, across multiple datasets and evaluation metrics, we observe that the UCB term consistently slows convergence and often reduces sample efficiency. In contrast, a simple \emph{Mixture-Greedy} strategy without explicit UCB-type optimism converges faster and achieves even better performance, particularly for widely used metrics such as FID and Vendi where tight confidence bounds are difficult to construct. We provide theoretical insight explaining this behavior: under transparent structural conditions, diversity-aware objectives induce implicit exploration by favoring interior mixtures, leading to linear sampling of all arms and sublinear regret guarantees for entropy-based, kernel-based, and FID-type objectives. These results suggest that in diversity-aware multi-armed bandits for generative model selection, exploration can arise intrinsically from the objective geometry, questioning the necessity of explicit confidence bonuses.
Abstract（参考訳）: 複数の生成モデル間の効率的な選択は、最適下モデルからのサンプリングがコストがかかる現代の生成AIにおいて、ますます重要になっている。この問題は、マルチアームバンディットタスクとして定式化することができる。多様性を意識した評価指標の下では、非退化したジェネレータの混合は個々のモデルよりも優れており、この設定を古典的なベストアーム識別と区別することができる。したがって、事前のアプローチでは、アッパー信頼境界(UCB)探索ボーナスを混合目的に組み込む。しかし、複数のデータセットや評価指標を通して、UTB項が一貫して収束を遅くし、しばしばサンプル効率を低下させることが観察される。対照的に、明示的な UCB 型楽観主義を持たない単純な \emph{Mixture-Greedy} 戦略はより早く収束し、特に厳密な信頼境界を構築するのが困難である FID や Vendi のような広く使われている指標に対してより優れた性能を達成する。透明な構造条件下では、多様性を意識した目的は、内部混合を優先して暗黙的な探索を誘導し、エントロピーベース、カーネルベース、FIDタイプの目的に対する全ての腕の線形サンプリングとサブ線形後悔保証をもたらす。これらの結果は,多様性を意識したモデル選択のためのマルチアーム帯状帯状帯状帯状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状線状

論文の概要: When Exploration Comes for Free with Mixture-Greedy: Do we need UCB in Diversity-Aware Multi-Armed Bandits?

関連論文リスト