Fugu-MT 論文翻訳(概要): Bandit algorithms to emulate human decision making using probabilistic distortions

論文の概要: Bandit algorithms to emulate human decision making using probabilistic distortions

arxiv url: http://arxiv.org/abs/1611.10283v3
Date: Tue, 31 Oct 2023 05:53:46 GMT
ステータス: 翻訳完了
システム内更新日: 2023-11-02 05:29:19.632148
Title: Bandit algorithms to emulate human decision making using probabilistic distortions
Title（参考訳）: 確率的歪みを用いた意思決定エミュレートのためのバンディットアルゴリズム
Authors: Ravi Kumar Kolla, Prashanth L.A., Aditya Gopalan, Krishna Jagannathan, Michael Fu and Steve Marcus
Abstract要約: 報奨分布に歪んだ確率を持つ2つの多重武装バンディット問題を定式化する。以上のような後悔の最小化の問題と、マルチアームバンディットのための最高の腕識別フレームワークについて考察する。
参考スコア（独自算出の注目度）: 20.422725678982726
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Motivated by models of human decision making proposed to explain commonly observed deviations from conventional expected value preferences, we formulate two stochastic multi-armed bandit problems with distorted probabilities on the reward distributions: the classic $K$-armed bandit and the linearly parameterized bandit settings. We consider the aforementioned problems in the regret minimization as well as best arm identification framework for multi-armed bandits. For the regret minimization setting in $K$-armed as well as linear bandit problems, we propose algorithms that are inspired by Upper Confidence Bound (UCB) algorithms, incorporate reward distortions, and exhibit sublinear regret. For the $K$-armed bandit setting, we derive an upper bound on the expected regret for our proposed algorithm, and then we prove a matching lower bound to establish the order-optimality of our algorithm. For the linearly parameterized setting, our algorithm achieves a regret upper bound that is of the same order as that of regular linear bandit algorithm called Optimism in the Face of Uncertainty Linear (OFUL) bandit algorithm, and unlike OFUL, our algorithm handles distortions and an arm-dependent noise model. For the best arm identification problem in the $K$-armed bandit setting, we propose algorithms, derive guarantees on their performance, and also show that these algorithms are order optimal by proving matching fundamental limits on performance. For best arm identification in linear bandits, we propose an algorithm and establish sample complexity guarantees. Finally, we present simulation experiments which demonstrate the advantages resulting from using distortion-aware learning algorithms in a vehicular traffic routing application.
Abstract（参考訳）: 従来の期待値の選好からの偏差を説明するために提案された人間の意思決定モデルにより、報酬分布に歪んだ確率を持つ確率的マルチアームバンディット問題を定式化し、古典的な$K$武器バンディットと線形パラメータ化バンディット設定を定式化する。本稿では,後悔の最小化の問題点と,複数腕のバンディットに対する最善のアーム識別フレームワークについて考察する。 K$の武器と線形バンディット問題における後悔の最小化設定のために、我々はアッパー信頼境界(UCB)アルゴリズムにインスパイアされたアルゴリズムを提案し、報酬歪みを取り入れ、サブ線形後悔を示す。 k$-armed bandit の設定では、提案アルゴリズムに期待される後悔の上限を導出し、アルゴリズムの順序最適化性を確立するために一致する下限を証明します。線形パラメータ化設定では,OFULとは違い,本アルゴリズムは歪みやアーム依存ノイズモデルを扱うため,通常の線形帯域幅アルゴリズムであるオプティミズム(Optimism in the Face of Uncertainty Linear, OFUL)の帯域幅アルゴリズムと同じ順序の残差上限を実現する。 k$-armed bandit設定における最高のarm識別問題に対して,我々はアルゴリズムを提案し,その性能の保証を導き出すとともに,それらのアルゴリズムが性能の基本的な限界を満たしていることを示す。線形バンドイットにおける最良アーム同定のために, アルゴリズムを提案し, サンプル複雑性の保証を確立する。最後に,車両交通ルーティングアプリケーションにおける歪み認識学習アルゴリズムの利点を実証するシミュレーション実験を行う。

関連論文リスト

Replicability is Asymptotically Free in Multi-armed Bandits [45.729105054410745]
この仕事の動機は、再現可能な機械学習の需要の増加にある。特に、高い確率で、アルゴリズムのアクション列がデータセットに固有のランダム性の影響を受けないように、複製可能なアルゴリズムを考える。
論文参考訳（メタデータ） (2024-02-12T03:31:34Z)
Versatile Dueling Bandits: Best-of-both-World Analyses for Online Learning from Preferences [28.79598714109439]
両環境および敵環境における$K$武器のデュエルバンディットの問題について検討する。まず,マルチアームのバンディットに対して,任意の(一般的な)デュエル・バンドレットから新たなリダクションを提案する。提案アルゴリズムは,コンドルチェット・ウィンナーベンチマークに対して最適な$O(sum_i = 1K fraclog TDelta_i)$ regret boundを達成した最初のアルゴリズムでもある。
論文参考訳（メタデータ） (2022-02-14T13:37:23Z)
Mean-based Best Arm Identification in Stochastic Bandits under Reward Contamination [80.53485617514707]
本稿では,ギャップベースアルゴリズムと逐次除去に基づく2つのアルゴリズムを提案する。具体的には、ギャップベースのアルゴリズムでは、サンプルの複雑さは定数要素まで最適であり、連続的な除去では対数因子まで最適である。
論文参考訳（メタデータ） (2021-11-14T21:49:58Z)
Optimal Gradient-based Algorithms for Non-concave Bandit Optimization [76.57464214864756]
この研究は、未知の報酬関数が非可逆であるようなバンドイット問題の大群を考察する。我々のアルゴリズムは、非常に一般化されたゼロ階最適化のパラダイムに基づいている。標準的な楽観的アルゴリズムは次元因子によって準最適であることを示す。
論文参考訳（メタデータ） (2021-07-09T16:04:24Z)
Correlated Bandits for Dynamic Pricing via the ARC algorithm [2.7564955518050693]
漸近ランダム化制御(Asymptotic Randomized Control)は、ベイズバンドの幅広いクラスに対する最適な戦略に厳密な近似を与える。これにより、意思決定者は報酬に加えて信号を観察し、異なる選択の結果の相関を組み込むことができ、見積もりに非自明なダイナミクスを持つことができる。
論文参考訳（メタデータ） (2021-02-08T14:54:26Z)
Upper Confidence Bounds for Combining Stochastic Bandits [52.10197476419621]
バンディットアルゴリズムを結合する簡単な手法を提案する。私たちのアプローチは、個々のbanditアルゴリズムのそれぞれを、より高いレベルのn$-armed bandit問題のアームとして扱う"meta-ucb"手順に基づいています。
論文参考訳（メタデータ） (2020-12-24T05:36:29Z)
Experimental Design for Regret Minimization in Linear Bandits [19.8309784360219]
オンライン・リニア・バンドレットにおける後悔を最小限に抑える設計に基づく新しいアルゴリズムを提案する。我々は、現在最先端の有限時間後悔保証を提供し、このアルゴリズムが帯域幅と半帯域幅の両方のフィードバックシステムに適用可能であることを示す。
論文参考訳（メタデータ） (2020-11-01T17:59:19Z)
An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits [129.1029690825929]
複数の次元に沿った最先端技術を改善する新しいアルゴリズムを提案する。非文脈線形帯域の特別な場合において、学習地平線に対して最小限の最適性を確立する。
論文参考訳（メタデータ） (2020-10-23T09:12:47Z)
Optimal Best-arm Identification in Linear Bandits [79.3239137440876]
サンプルの複雑さが既知のインスタンス固有の下界と一致する単純なアルゴリズムを考案する。既存のベストアーム識別戦略とは異なり、我々のアルゴリズムは武器の数に依存しない停止規則を用いる。
論文参考訳（メタデータ） (2020-06-29T14:25:51Z)
Corralling Stochastic Bandit Algorithms [54.10645564702416]
相関アルゴリズムの後悔は、最も報酬の高い腕を含む最高のアルゴリズムの後悔よりも悪くはないことを示す。最高報酬と他の報酬の差は、最高報酬と他の報酬の差に依存することを示す。
論文参考訳（メタデータ） (2020-06-16T15:33:12Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。