Fugu-MT 論文翻訳(概要): Adaptive Learning with Artificial Barriers Yielding Nash Equilibria in General Games

論文の概要: Adaptive Learning with Artificial Barriers Yielding Nash Equilibria in General Games

arxiv url: http://arxiv.org/abs/2203.15780v1
Date: Mon, 28 Mar 2022 09:04:13 GMT
ステータス: 翻訳完了
システム内更新日: 2022-03-30 16:43:11.878319
Title: Adaptive Learning with Artificial Barriers Yielding Nash Equilibria in General Games
Title（参考訳）: 一般ゲームにおけるnash平衡を産出する人工障壁を用いた適応学習
Authors: Ismail Hassan, Anis Yazidi, B. John Oommen
Abstract要約: ラーニングオートマタ(LA)は、1980年代に初めて提案されたが、強力だが未探索のコンセプトである。本稿では,汎用的なビマトリクスゲームを実現するために,人工バリアを持つLAを考案する。
参考スコア（独自算出の注目度）: 12.0855096102517
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Artificial barriers in Learning Automata (LA) is a powerful and yet under-explored concept although it was first proposed in the 1980s. Introducing artificial non-absorbing barriers makes the LA schemes resilient to being trapped in absorbing barriers, a phenomenon which is often referred to as lock in probability leading to an exclusive choice of one action after convergence. Within the field of LA and reinforcement learning in general, there is a sacristy of theoretical works and applications of schemes with artificial barriers. In this paper, we devise a LA with artificial barriers for solving a general form of stochastic bimatrix game. Classical LA systems possess properties of absorbing barriers and they are a powerful tool in game theory and were shown to converge to game's of Nash equilibrium under limited information. However, the stream of works in LA for solving game theoretical problems can merely solve the case where the Saddle Point of the game exists in a pure strategy and fail to reach mixed Nash equilibrium when no Saddle Point exists for a pure strategy. In this paper, by resorting to the powerful concept of artificial barriers, we suggest a LA that converges to an optimal mixed Nash equilibrium even though there may be no Saddle Point when a pure strategy is invoked. Our deployed scheme is of Linear Reward-Inaction ($L_{R-I}$) flavor which is originally an absorbing LA scheme, however, we render it non-absorbing by introducing artificial barriers in an elegant and natural manner, in the sense that that the well-known legacy $L_{R-I}$ scheme can be seen as an instance of our proposed algorithm for a particular choice of the barrier. Furthermore, we present an $S$ Learning version of our LA with absorbing barriers that is able to handle $S$-Learning environment in which the feedback is continuous and not binary as in the case of the $L_{R-I}$.
Abstract（参考訳）: 学習オートマタ(LA)における人工障壁は、1980年代に初めて提案されたが、強力で未探索のコンセプトである。人工的な非吸収バリアの導入により、LAスキームは吸収バリアに閉じ込められることに耐性があり、これは確率においてロックと呼ばれる現象であり、収束後の1つのアクションの排他的選択につながる。 LAの分野と強化学習の分野には、理論的な研究と人工障壁を持つスキームの適用の犠牲がある。本稿では,確率的ビマトリクスの一般的な形式を解くために,人工バリア付きLAを考案する。古典的なLAシステムは吸収障壁の性質を持ち、ゲーム理論において強力な道具であり、限られた情報の下でのナッシュ均衡のゲームに収束することが示されている。しかし、laにおけるゲーム理論問題を解くための作品の流れは、純粋戦略においてゲームの鞍点が存在する場合にのみ解決することができ、純粋な戦略に対して鞍点が存在しない場合、混合ナッシュ平衡に達することができない。本稿では, 人工バリアの強力な概念を活かして, 純粋な戦略が実行された場合に, サドルポイントが存在しないとしても, 最適混合ナッシュ平衡に収束するLAを提案する。当社の展開方式は,la方式を吸収する線形報酬非反応(l_{r-i}$)であるが,高名なl_{r-i}$スキームが提案するアルゴリズムの具体例として見ることができ,エレガントで自然な方法で人工的バリアを導入することで,非吸収を図っている。さらに、私たちは、L_{R-I}$のようにフィードバックが連続的でバイナリではない、$S$学習環境を扱うことができる、吸収障壁を持つLAの$S$学習バージョンを提示します。

関連論文リスト

Fundamental Limits of Game-Theoretic LLM Alignment: Smith Consistency and Preference Matching [23.0436612817548]
ヒューマンフィードバックからのナッシュラーニング(Nash Learning from Human Feedback)は、学習をゼロサムゲームとしてモデル化することで、大きな言語モデルを人間の好みに合わせるためのフレームワークである。本稿では,人選好に基づく配当選択が望ましいアライメント特性を得られるかを検討する。
論文参考訳（メタデータ） (2025-05-27T02:07:35Z)
Barriers to Welfare Maximization with No-Regret Learning [68.66209476382213]
我々は、ほぼ最適の$T$-sparse CCEの計算限界を低く証明する。特に,最大傾斜角の不適応性は,時間内に非自明な間隔を達成できないことを示す。
論文参考訳（メタデータ） (2024-11-04T00:34:56Z)
Large Language Models Playing Mixed Strategy Nash Equilibrium Games [1.060608983034705]
本稿では,混合戦略のナッシュ均衡と純粋戦略のナッシュ均衡が存在しないゲームにおいて,ナッシュ均衡を求めるための大規模言語モデルの能力に焦点を当てる。この研究は、コード実行の可能性を備えたLLMの性能が大幅に向上していることを明らかにする。 LLMは、よく知られた標準ゲームにおいて顕著な熟練度を示すが、その性能は、同じゲームのわずかな変更に直面した時に低下する。
論文参考訳（メタデータ） (2024-06-15T09:30:20Z)
On Tractable $Φ$-Equilibria in Non-Concave Games [53.212133025684224]
非凹面ゲームはゲーム理論と最適化に重大な課題をもたらす。 Phi$が有限であるとき、対応する$Phi$-equilibriaに収束する効率的な非結合学習アルゴリズムが存在することを示す。また,オンライングラディエントDescentは,非自明な状況下で効率よく$Phi$-equilibriaを近似できることを示した。
論文参考訳（メタデータ） (2024-03-13T01:51:30Z)
Multi-Sender Persuasion: A Computational Perspective [41.88812114165843]
マルチベンダーの説得問題を考察する。計算経済学、マルチエージェント学習、機械学習で広く使われている。我々は,このゲームの非線形かつ不連続なユーティリティを近似するために,新しい微分可能なニューラルネットワークを提案する。
論文参考訳（メタデータ） (2024-02-07T15:50:20Z)
Exploiting hidden structures in non-convex games for convergence to Nash equilibrium [62.88214569402201]
現代の機械学習アプリケーションは、非協調的なナッシュリリアとして定式化することができる。決定論的環境と決定論的環境の両方に明確な収束保証を提供する。
論文参考訳（メタデータ） (2023-12-27T15:21:25Z)
Differentiable Arbitrating in Zero-sum Markov Games [59.62061049680365]
ゼロサムマルコフゲームにおいて、2人のプレイヤーが望ましいナッシュ均衡、すなわち仲裁を誘導する報酬を摂動する方法を研究する。低いレベルでは、与えられた報酬関数の下でのナッシュ均衡の解決が必要であり、それによって全体的な問題をエンドツーエンドで最適化することが難しくなる。上層階の勾配フィードバックを提供するナッシュ平衡を微分するバックプロパゲーション方式を提案する。
論文参考訳（メタデータ） (2023-02-20T16:05:04Z)
Minimax-Optimal Multi-Agent RL in Zero-Sum Markov Games With a Generative Model [50.38446482252857]
2人プレイのゼロサムマルコフゲームは多エージェント強化学習においておそらく最も基本的な設定である。我々は,$$ widetildeObiggを用いて,$varepsilon$-approximate Markov NEポリシーを学習する学習アルゴリズムを開発した。我々は、分散型量の役割を明確にするFTRLに対する洗練された後悔境界を導出する。
論文参考訳（メタデータ） (2022-08-22T17:24:55Z)
Laplace Redux -- Effortless Bayesian Deep Learning [79.70292248127467]
ラプラス近似は変分ベイズやディープアンサンブルのような代替法ほど人気がないことを示す。 PyTorch用の使いやすいソフトウェアライブラリである"Laplace"を紹介します。実験を通して、LAは計算コストの点で優れているが、性能の面ではより一般的な代替品と競合することを示した。
論文参考訳（メタデータ） (2021-06-28T15:30:40Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。