Fugu-MT 論文翻訳(概要): FraPPE: Fast and Efficient Preference-based Pure Exploration

論文の概要: FraPPE: Fast and Efficient Preference-based Pure Exploration

arxiv url: http://arxiv.org/abs/2508.16487v1
Date: Fri, 22 Aug 2025 16:02:06 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-25 16:42:36.44307
Title: FraPPE: Fast and Efficient Preference-based Pure Exploration
Title（参考訳）: FraPPE: 高速かつ効率的な推論ベースの純粋探索
Authors: Udvas Das, Apurv Shukla, Debabrota Basu,
Abstract要約: 任意の選好円錐に対して既存の下界を最適に追跡する効率的なアルゴリズムを提案する。提案したPrePExアルゴリズムであるFraPPEが最適なサンプル複雑性を実現することを証明した。
参考スコア（独自算出の注目度）: 17.53646399595373
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Preference-based Pure Exploration (PrePEx) aims to identify with a given confidence level the set of Pareto optimal arms in a vector-valued (aka multi-objective) bandit, where the reward vectors are ordered via a (given) preference cone $\mathcal{C}$. Though PrePEx and its variants are well-studied, there does not exist a computationally efficient algorithm that can optimally track the existing lower bound for arbitrary preference cones. We successfully fill this gap by efficiently solving the minimisation and maximisation problems in the lower bound. First, we derive three structural properties of the lower bound that yield a computationally tractable reduction of the minimisation problem. Then, we deploy a Frank-Wolfe optimiser to accelerate the maximisation problem in the lower bound. Together, these techniques solve the maxmin optimisation problem in $\mathcal{O}(KL^{2})$ time for a bandit instance with $K$ arms and $L$ dimensional reward, which is a significant acceleration over the literature. We further prove that our proposed PrePEx algorithm, FraPPE, asymptotically achieves the optimal sample complexity. Finally, we perform numerical experiments across synthetic and real datasets demonstrating that FraPPE achieves the lowest sample complexities to identify the exact Pareto set among the existing algorithms.
Abstract（参考訳）: PrePEx(Preference-based Pure Exploration)は、ベクター値(いわゆる多目的)バンディットにおけるパレート最適アームのセットを与えられた信頼度レベルで識別することを目的としており、報酬ベクトルは(希望)選好円錐$\mathcal{C}$を介して順序付けられる。 PrePExとその変種はよく研究されているが、任意の選好円錐に対して既存の下界を最適に追跡できる計算効率のよいアルゴリズムは存在しない。低域における最小化と最大化の問題を効率的に解き、このギャップを埋めることに成功した。まず、最小化問題の計算的縮小をもたらす下界の3つの構造特性を導出する。次に,Frank-Wolfeオプティマイザを配置し,下界における最大化問題を高速化する。これらの手法はともに、$K$アームと$L$次元報酬を持つバンドイットのインスタンスに対して、$\mathcal{O}(KL^{2})$時間における最大最適化問題を解く。さらに,提案したPrePExアルゴリズムであるFraPPEが,漸近的に最適なサンプル複雑性を実現することを証明した。最後に、FraPPEが既存のアルゴリズムの中で正確なPareto集合を識別するために、最も低いサンプル複素量を達成することを示す合成および実データセットの数値実験を行う。

関連論文リスト

Span-Agnostic Optimal Sample Complexity and Oracle Inequalities for Average-Reward RL [6.996002801232415]
生成モデルを用いてマルコフ決定過程(MDP)において,$varepsilon$-optimal Policyを求める際のサンプル複雑性について検討した。我々は,知識を必要とせず,最適なスパンベース複雑性に適合するアルゴリズムを開発した。
論文参考訳（メタデータ） (2025-02-16T19:10:55Z)
Explicit Second-Order Min-Max Optimization Methods with Optimal Convergence Guarantee [86.05440220344755]
我々は,非制約のmin-max最適化問題のグローバルなサドル点を求めるために,不正確な正規化ニュートン型手法を提案し,解析する。提案手法は有界集合内に留まるイテレートを生成し、その反復は制限関数の項で$O(epsilon-2/3)$内の$epsilon$-saddle点に収束することを示す。
論文参考訳（メタデータ） (2022-10-23T21:24:37Z)
Best Policy Identification in Linear MDPs [70.57916977441262]
縮退した線形マルコフ+デルタ決定における最適同定問題について, 生成モデルに基づく固定信頼度設定における検討を行った。複雑な非最適化プログラムの解としての下位境界は、そのようなアルゴリズムを考案する出発点として用いられる。
論文参考訳（メタデータ） (2022-08-11T04:12:50Z)
Optimal and Efficient Dynamic Regret Algorithms for Non-Stationary Dueling Bandits [27.279654173896372]
我々は,非定常的あるいは時間的に異なる選好の下で,$K$のDueling Banditsにおける空力的後悔の最小化問題について検討した。これは、エージェントが各ラウンドで一対のアイテムを選択し、このペアに対する相対的な二項のウィンロスフィードバックのみを観察するオンライン学習設定である。
論文参考訳（メタデータ） (2021-11-06T16:46:55Z)
Minimax Optimization with Smooth Algorithmic Adversaries [59.47122537182611]
対戦相手が展開するスムーズなアルゴリズムに対して,Min-playerの新しいアルゴリズムを提案する。本アルゴリズムは,制限周期のない単調進行を保証し,適切な勾配上昇数を求める。
論文参考訳（メタデータ） (2021-06-02T22:03:36Z)
Adaptive Sampling for Best Policy Identification in Markov Decision Processes [79.4957965474334]
本稿では,学習者が生成モデルにアクセスできる場合の,割引マルコフ決定(MDP)における最良の政治的識別の問題について検討する。最先端アルゴリズムの利点を論じ、解説する。
論文参考訳（メタデータ） (2020-09-28T15:22:24Z)
Exploiting Higher Order Smoothness in Derivative-free Optimization and Continuous Bandits [99.70167985955352]
強凸関数のゼロ次最適化問題について検討する。予測勾配降下アルゴリズムのランダム化近似を考察する。その結果,0次アルゴリズムはサンプルの複雑性や問題パラメータの点でほぼ最適であることが示唆された。
論文参考訳（メタデータ） (2020-06-14T10:42:23Z)
Stochastic Proximal Gradient Algorithm with Minibatches. Application to Large Scale Learning Models [2.384873896423002]
非滑らかな成分を持つ汎用合成対象関数に対する勾配アルゴリズムのミニバッチ変種を開発し解析する。我々は、最小バッチサイズ$N$に対して、$mathcalO(frac1Nepsilon)$$epsilon-$subityが最適解に期待される二次距離で達成されるような、定数および変数のステップサイズ反復ポリシーの複雑さを提供する。
論文参考訳（メタデータ） (2020-03-30T10:43:56Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。