Fugu-MT 論文翻訳(概要): Asymptotic Instance-Optimal Algorithms for Interactive Decision Making

論文の概要: Asymptotic Instance-Optimal Algorithms for Interactive Decision Making

arxiv url: http://arxiv.org/abs/2206.02326v1
Date: Mon, 6 Jun 2022 02:56:10 GMT
ステータス: 翻訳完了
システム内更新日: 2022-06-07 17:02:33.754702
Title: Asymptotic Instance-Optimal Algorithms for Interactive Decision Making
Title（参考訳）: 対話的意思決定のための漸近的インスタンス最適アルゴリズム
Authors: Kefan Dong, Tengyu Ma
Abstract要約: 理想的なアルゴリズムは、特定の問題インスタンスの複雑さに適応し、最悪の場合よりも簡単なインスタンスに対する後悔を小さくするべきである。本稿では,軽度条件下での有限個の判定問題に対して,対話型意思決定のための最初のインスタンス最適化アルゴリズムを設計する。
参考スコア（独自算出の注目度）: 35.76184529520015
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Past research on interactive decision making problems (bandits, reinforcement learning, etc.) mostly focuses on the minimax regret that measures the algorithm's performance on the hardest instance. However, an ideal algorithm should adapt to the complexity of a particular problem instance and incur smaller regrets on easy instances than worst-case instances. In this paper, we design the first asymptotic instance-optimal algorithm for general interactive decision making problems with finite number of decisions under mild conditions. On \textit{every} instance $f$, our algorithm outperforms \emph{all} consistent algorithms (those achieving non-trivial regrets on all instances), and has asymptotic regret $\mathcal{C}(f) \ln n$, where $\mathcal{C}(f)$ is an exact characterization of the complexity of $f$. The key step of the algorithm involves hypothesis testing with active data collection. It computes the most economical decisions with which the algorithm collects observations to test whether an estimated instance is indeed correct; thus, the complexity $\mathcal{C}(f)$ is the minimum cost to test the instance $f$ against other instances. Our results, instantiated on concrete problems, recover the classical gap-dependent bounds for multi-armed bandits [Lai and Robbins, 1985] and prior works on linear bandits [Lattimore and Szepesvari, 2017], and improve upon the previous best instance-dependent upper bound [Xu et al., 2021] for reinforcement learning.
Abstract（参考訳）: インタラクティブな意思決定問題(バンド、強化学習など)に関する過去の研究は、アルゴリズムの最も困難なインスタンスにおけるパフォーマンスを測定するミニマックス後悔に焦点を当てていた。しかし、理想的なアルゴリズムは、特定の問題インスタンスの複雑さに適応し、最悪の場合よりも簡単なインスタンスに対する後悔を少なくするべきである。本稿では,軽度条件下での有限数の判定問題に対する一般対話型意思決定のための,最初の漸近的インスタンス最適化アルゴリズムを設計する。 textit{every} インスタンス $f$ において、我々のアルゴリズムは \emph{all} 一貫性のあるアルゴリズムよりも優れており(すべてのインスタンスで非自明な後悔を達成している)、漸近的な後悔$\mathcal{c}(f) \ln n$ があり、ここで $\mathcal{c}(f)$ は$f$ の複雑さの正確な特徴である。アルゴリズムの重要なステップは、アクティブなデータ収集を伴う仮説テストである。これは、推定されたインスタンスが実際に正しいかどうかをテストするために、アルゴリズムが観測を収集する最も経済的決定を計算し、したがって複雑さ$\mathcal{C}(f)$は、他のインスタンスに対してインスタンスをテストするための最小コストである。本研究は,具体的問題に対するインスタンス化を行い,マルチアーム付きバンディット [lai and robbins, 1985] と線形バンディット [lattimore and szepesvari, 2017] の古典的ギャップ依存境界を回復し,強化学習のための最善のインスタンス依存上界 [xu et al., 2021] を改善した。

関連論文リスト

The Limits of Assumption-free Tests for Algorithm Performance [6.7171902258864655]
与えられたモデリングタスクにおいてアルゴリズムはどの程度うまく機能し、どのアルゴリズムが最善を尽くすか? 一方、特定のトレーニングデータセットに対して$A$を実行して生成された特定の適合モデルが$n$であるのか?
論文参考訳（メタデータ） (2024-02-12T03:19:30Z)
Provably Efficient Reinforcement Learning via Surprise Bound [66.15308700413814]
本稿では,一般値関数近似を用いた効率の良い強化学習アルゴリズムを提案する。本アルゴリズムは, 線形設定と疎高次元線形設定の両方に適用した場合に, 合理的な後悔境界を達成できる。
論文参考訳（メタデータ） (2023-02-22T20:21:25Z)
Differentially-Private Hierarchical Clustering with Provable Approximation Guarantees [79.59010418610625]
階層クラスタリングのための微分プライベート近似アルゴリズムについて検討する。例えば、$epsilon$-DPアルゴリズムは入力データセットに対して$O(|V|2/epsilon)$-additiveエラーを示さなければならない。本稿では,ブロックを正確に復元する1+o(1)$近似アルゴリズムを提案する。
論文参考訳（メタデータ） (2023-01-31T19:14:30Z)
Instance-Dependent Regret Analysis of Kernelized Bandits [19.252319300590653]
雑音の多いゼロオーダーオーラを問合せするための適応戦略を設計することを含む、カーネル化された帯域幅問題について検討する。正規化された累積後悔を解消する(関数クラス上)アルゴリズムに対して、不一致依存的後悔の下限を導出する。
論文参考訳（メタデータ） (2022-03-12T00:53:59Z)
Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free Reinforcement Learning [52.76230802067506]
漸進的強化学習における後悔を最小限に抑えるために,新しいモデルフリーアルゴリズムを提案する。提案アルゴリズムは、2つのQ-ラーニングシーケンスの助けを借りて、初期設定された参照更新ルールを用いる。初期の分散還元法の設計原理は、他のRL設定とは独立した関心を持つかもしれない。
論文参考訳（メタデータ） (2021-10-09T21:13:48Z)
Conic Blackwell Algorithm: Parameter-Free Convex-Concave Saddle-Point Solving [21.496093093434357]
我々は、新しい単純後悔最小化器、コニック・ブラックウェルアルゴリズム$+$(CBA$+$)を開発する。 CBA$+$,$ell_p$標準球,および楕円型信頼領域の実装方法を示す。本稿では,行列ゲームと分散ロバストな最適化問題を解くための数値実験について述べる。
論文参考訳（メタデータ） (2021-05-27T14:50:31Z)
Online Model Selection for Reinforcement Learning with Function Approximation [50.008542459050155]
我々は、$tildeO(L5/6 T2/3)$ regretで最適な複雑性に適応するメタアルゴリズムを提案する。また、メタアルゴリズムは、インスタンス依存の後悔境界を著しく改善することを示す。
論文参考訳（メタデータ） (2020-11-19T10:00:54Z)
Adaptive Sampling for Best Policy Identification in Markov Decision Processes [79.4957965474334]
本稿では,学習者が生成モデルにアクセスできる場合の,割引マルコフ決定(MDP)における最良の政治的識別の問題について検討する。最先端アルゴリズムの利点を論じ、解説する。
論文参考訳（メタデータ） (2020-09-28T15:22:24Z)
Beyond Worst-Case Analysis in Stochastic Approximation: Moment Estimation Improves Instance Complexity [58.70807593332932]
近似問題に対する勾配に基づく手法のオラクル複雑性について検討する。最悪のケースの複雑さではなく、インスタンス依存の複雑さに焦点を当てます。提案アルゴリズムとその解析はモーメント推定の成功を理論的に正当化する。
論文参考訳（メタデータ） (2020-06-08T09:25:47Z)
Learning Sparse Classifiers: Continuous and Mixed Integer Optimization Perspectives [10.291482850329892]
混合整数計画法(MIP)は、(最適に) $ell_0$-正規化回帰問題を解くために用いられる。数分で5万ドルの機能を処理できる正確なアルゴリズムと、$papprox6$でインスタンスに対処できる近似アルゴリズムの2つのクラスを提案する。さらに,$ell$-regularizedsに対する新しい推定誤差境界を提案する。
論文参考訳（メタデータ） (2020-01-17T18:47:02Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。