Fugu-MT 論文翻訳(概要): Using Non-Stationary Bandits for Learning in Repeated Cournot Games with Non-Stationary Demand

論文の概要: Using Non-Stationary Bandits for Learning in Repeated Cournot Games with Non-Stationary Demand

arxiv url: http://arxiv.org/abs/2201.00486v1
Date: Mon, 3 Jan 2022 05:51:47 GMT
ステータス: 翻訳完了
システム内更新日: 2022-01-04 15:54:00.621671
Title: Using Non-Stationary Bandits for Learning in Repeated Cournot Games with Non-Stationary Demand
Title（参考訳）: 非定常帯域を用いた反復クールノーゲームにおける学習
Authors: Kshitija Taywade, Brent Harrison, Judy Goldsmith
Abstract要約: 本稿では,非定常要求の繰り返しCournotゲームについてモデル化する。エージェントが選択できる武器/アクションのセットは、個別の生産量を表す。本稿では,よく知られた$epsilon$-greedyアプローチに基づく,新しいアルゴリズム"Adaptive with Weighted Exploration (AWE) $epsilon$-greedy"を提案する。
参考スコア（独自算出の注目度）: 11.935419090901524
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Many past attempts at modeling repeated Cournot games assume that demand is stationary. This does not align with real-world scenarios in which market demands can evolve over a product's lifetime for a myriad of reasons. In this paper, we model repeated Cournot games with non-stationary demand such that firms/agents face separate instances of non-stationary multi-armed bandit problem. The set of arms/actions that an agent can choose from represents discrete production quantities; here, the action space is ordered. Agents are independent and autonomous, and cannot observe anything from the environment; they can only see their own rewards after taking an action, and only work towards maximizing these rewards. We propose a novel algorithm 'Adaptive with Weighted Exploration (AWE) $\epsilon$-greedy' which is remotely based on the well-known $\epsilon$-greedy approach. This algorithm detects and quantifies changes in rewards due to varying market demand and varies learning rate and exploration rate in proportion to the degree of changes in demand, thus enabling agents to better identify new optimal actions. For efficient exploration, it also deploys a mechanism for weighing actions that takes advantage of the ordered action space. We use simulations to study the emergence of various equilibria in the market. In addition, we study the scalability of our approach in terms number of total agents in the system and the size of action space. We consider both symmetric and asymmetric firms in our models. We found that using our proposed method, agents are able to swiftly change their course of action according to the changes in demand, and they also engage in collusive behavior in many simulations.
Abstract（参考訳）: 繰り返し行われるクールノットゲームのモデリングの試みの多くは、需要が静止していると仮定している。これは、無数の理由で製品の寿命にわたって市場要求が進化できる現実のシナリオとは一致しない。本稿では,非定常的需要を伴うCournotゲームを繰り返しモデル化し,企業/エージェントが非定常的マルチアームバンディット問題の個別の事例に直面した。エージェントが選択できる武器/アクションのセットは、個別の生産量を表しており、ここではアクション空間を順序付けする。エージェントは独立し、自律的であり、環境から何も観察できない。アクションを行った後、自分達の報酬しか見えず、これらの報酬を最大化するためにのみ働く。本稿では,よく知られた$\epsilon$-greedyアプローチに基づく,新しいアルゴリズム"Adaptive with Weighted Exploration (AWE) $\epsilon$-greedy"を提案する。このアルゴリズムは、市場需要の変化による報酬の変化を検出し、定量化し、需要の変化の程度に応じて学習率と探索率を変化させる。効率的な探索のためには、順序付けられたアクション空間を利用するアクションの重み付け機構も展開する。市場における様々な均衡の出現をシミュレーションで研究する。さらに,本手法のスケーラビリティを,システム内の総エージェント数と行動空間の大きさの観点から検討した。モデルには対称型と非対称型の両方を考慮する。提案手法により,エージェントは需要の変化に応じて迅速に行動経路を変化させることができ,多くのシミュレーションにおいて協調行動にも関与することがわかった。

関連論文リスト

Competing Bandits in Decentralized Large Contextual Matching Markets [13.313881962771777]
我々は、需要側(プレイヤーまたはエージェント)が大きな供給側(腕)と競合する二面的マッチング市場における分散学習を研究する。提案アルゴリズムは,腕の数によらず,インスタンス依存の対数的後悔を実現する。
論文参考訳（メタデータ） (2024-11-18T18:08:05Z)
Robust and Performance Incentivizing Algorithms for Multi-Armed Bandits with Strategic Agents [52.75161794035767]
性能インセンティブとロバストネスの2つの目的を同時に満たすバンディットアルゴリズムのクラスを導入する。そこで本研究では,第2価格オークションのアイデアをアルゴリズムと組み合わせることで,プリンシパルが腕の性能特性に関する情報を持たないような設定が可能であることを示す。
論文参考訳（メタデータ） (2023-12-13T06:54:49Z)
On Imperfect Recall in Multi-Agent Influence Diagrams [57.21088266396761]
マルチエージェント・インフルエンス・ダイアグラム(MAID)はベイズネットワークに基づくゲーム理論モデルとして人気がある。混合ポリシと2種類の相関平衡を用いて, 忘れ易いエージェントと不注意なエージェントでMAIDを解く方法を示す。また,不完全なリコールがしばしば避けられないマルコフゲームやチーム状況へのMAIDの適用についても述べる。
論文参考訳（メタデータ） (2023-07-11T07:08:34Z)
A Black-box Approach for Non-stationary Multi-agent Reinforcement Learning [53.83345471268163]
非定常マルチエージェントシステムにおける平衡の学習について検討する。単エージェント学習へのブラックボックス還元による様々な平衡の検証方法を示す。
論文参考訳（メタデータ） (2023-06-12T23:48:24Z)
Stochastic Market Games [10.979093424231532]
我々は、市場力を利用して、エージェントが協力的になるためのインセンティブを提供することを提案する。プリソナーズ・ジレンマの反復版で示されているように、提案された市場の定式化はゲームのダイナミクスを変えることができる。市場の存在は、全体的な結果と、取引活動を通じて個々のリターンを代理する双方を改善することができることを実証的に見出した。
論文参考訳（メタデータ） (2022-07-15T10:37:16Z)
Dynamic Memory for Interpretable Sequential Optimisation [0.0]
大規模展開に適した非定常性を扱うためのソリューションを提案する。動的メモリの新たな形態を取り入れた適応型ベイズ学習エージェントを開発した。自動アズ・ア・サービス(Automatic-as-a-service)の大規模展開のアーキテクチャについて説明する。
論文参考訳（メタデータ） (2022-06-28T12:29:13Z)
Uplifting Bandits [23.262188897812475]
報酬は複数の確率変数の和であり、各アクションはそれらの一部の分布だけを変化させるマルチアームバンディットモデルを導入する。このモデルはマーケティングキャンペーンやレコメンデーションシステムによって動機付けられており、そこでは変数が個々の顧客の結果を表す。ベースライン上のアクションの上昇を推定する UCB スタイルのアルゴリズムを提案する。
論文参考訳（メタデータ） (2022-06-08T18:00:56Z)
Off-Beat Multi-Agent Reinforcement Learning [62.833358249873704]
オフビート動作が一般的環境におけるモデルフリーマルチエージェント強化学習(MARL)について検討した。モデルレスMARLアルゴリズムのための新しいエピソードメモリLeGEMを提案する。我々は,Stag-Hunter Game,Quarry Game,Afforestation Game,StarCraft IIマイクロマネジメントタスクなど,オフビートアクションを伴うさまざまなマルチエージェントシナリオ上でLeGEMを評価する。
論文参考訳（メタデータ） (2022-05-27T02:21:04Z)
Modelling Cournot Games as Multi-agent Multi-armed Bandits [4.751331778201811]
繰り返しCournot oligopolyゲームにおけるマルチエージェントマルチアーム・バンディット(MA-MAB)の設定について検討した。私たちは、$epsilon$-greedyアプローチが、従来のMABアプローチよりもより実行可能な学習メカニズムを提供することに気付きました。順序付けられたアクション空間を利用する新しいアプローチとして、$epsilon$-greedy+HLと$epsilon$-greedy+ELを提案する。
論文参考訳（メタデータ） (2022-01-01T22:02:47Z)
Reinforcement Learning in Reward-Mixing MDPs [74.41782017817808]
報酬混合マルコフ決定過程(MDP)におけるエピソード強化学習 cdot S2 A2)$ episodes, where$H$ is time-horizon and $S, A$ are the number of state and actions。 epsilon$-optimal policy after $tildeO(poly(H,epsilon-1) cdot S2 A2)$ episodes, $H$ is time-horizon and $S, A$ are the number of state and actions。
論文参考訳（メタデータ） (2021-10-07T18:55:49Z)
Robust Allocations with Diversity Constraints [65.3799850959513]
エージェント値の積を最大化するナッシュ福祉規則は,多様性の制約が導入されたとき,一意にロバストな位置にあることを示す。また, ナッシュ・ウェルズによる保証は, 広く研究されているアロケーション・ルールのクラスにおいて, ほぼ最適であることを示す。
論文参考訳（メタデータ） (2021-09-30T11:09:31Z)
Markov Decision Process modeled with Bandits for Sequential Decision Making in Linear-flow [73.1896399783641]
会員/加入者の獲得と保持では、複数のページを連続してマーケティングコンテンツを推奨する必要がある。遷移確率行列をモデル化するためにBandits を用いた MDP としてこの問題を定式化することを提案する。提案したMDPのBanditsアルゴリズムは,$epsilon$-greedyと$epsilon$-greedy,$epsilon$,IndependentBandits,InteractionBanditsでQ-learningを上回っている。
論文参考訳（メタデータ） (2021-07-01T03:54:36Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。