Fugu-MT 論文翻訳(概要): A Survey of Reinforcement Learning For Economics

論文の概要: A Survey of Reinforcement Learning For Economics

arxiv url: http://arxiv.org/abs/2603.08956v3
Date: Tue, 17 Mar 2026 08:31:37 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-18 13:19:43.658416
Title: A Survey of Reinforcement Learning For Economics
Title（参考訳）: 経済の強化学習に関する調査
Authors: Pranjal Rawat,
Abstract要約: 強化学習アルゴリズムは、動的プログラミングの自然なサンプルベースの拡張を提供する。私は古典的計画と近代的な学習アルゴリズムを結びつける理論をレビューする。これらのアルゴリズムの実用的脆弱性について検討する。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This survey (re)introduces reinforcement learning methods to economists. The curse of dimensionality limits how far exact dynamic programming can be effectively applied, forcing us to rely on suitably "small" problems or our ability to convert "big" problems into smaller ones. While this reduction has been sufficient for many classical applications, a growing class of economic models resists such reduction. Reinforcement learning algorithms offer a natural, sample-based extension of dynamic programming, extending tractability to problems with high-dimensional states, continuous actions, and strategic interactions. I review the theory connecting classical planning to modern learning algorithms and demonstrate their mechanics through simulated examples in pricing, inventory control, strategic games, and preference elicitation. I also examine the practical vulnerabilities of these algorithms, noting their brittleness, sample inefficiency, sensitivity to hyperparameters, and the absence of global convergence guarantees outside of tabular settings. The successes of reinforcement learning remain strictly bounded by these constraints, as well as a reliance on accurate simulators. When guided by economic structure, reinforcement learning provides a remarkably flexible framework. It stands as an imperfect, but promising, addition to the computational economist's toolkit. A companion survey (Rust and Rawat, 2026b) covers the inverse problem of inferring preferences from observed behavior. All simulation code is publicly available.
Abstract（参考訳）: この調査は、経済学者への強化学習手法の導入である。次元性の呪いは、いかに正確な動的プログラミングを効果的に適用できるかを制限し、我々は適切な「小さな」問題や「大きな」問題をより小さなものに変換する能力に頼らざるを得ない。この還元は多くの古典的応用に十分であるが、成長する経済モデルのクラスはそのような還元に抵抗する。強化学習アルゴリズムは、動的プログラミングの自然なサンプルベースの拡張を提供する。古典的計画と近代的な学習アルゴリズムを結びつける理論を概観し、価格、在庫管理、戦略ゲーム、選好の模擬例を通してその力学を実証する。また,これらのアルゴリズムの脆弱性,サンプルの非効率性,ハイパーパラメータに対する感度,および表の設定外におけるグローバルコンバージェンス保証の欠如など,実用上の脆弱性についても検討した。強化学習の成功は、正確なシミュレータに依存するだけでなく、これらの制約によって厳密に制限されている。経済構造によってガイドされるとき、強化学習は驚くほど柔軟な枠組みを提供する。これは計算経済学のツールキットに加えて、不完全だが有望である。共同調査(Rust and Rawat, 2026b)では、観察された行動から好みを推測する逆問題について取り上げている。シミュレーションコードはすべて公開されている。

関連論文リスト

ADORA: Training Reasoning Models with Dynamic Advantage Estimation on Reinforcement Learning [32.8666744273094]
textbfOnline textbfRollout textbfAdaptation, textbfADORA (textbfAdvantage textbfDynamics via textbfOnline textbfRollout textbfAdaptation)を導入する。
論文参考訳（メタデータ） (2026-02-10T17:40:39Z)
$\textbf{AGT$^{AO}$}$: Robust and Stabilized LLM Unlearning via Adversarial Gating Training with Adaptive Orthogonality [28.07698632768221]
本研究では,ロバスト消去と実用性保全を両立させる統一的な枠組みを提案する。 Adversarial Gating Training (AGT)$ formulates unlearning as a latent-space min-max game。実験によると、AGT$は、未学習の有効性とモデルユーティリティのトレードオフを緩和する。
論文参考訳（メタデータ） (2026-02-02T06:19:27Z)
Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regression [35.16407520369906]
局所線形注意(Local Linear Attention)は、テスト時間回帰のレンズを通して非パラメトリック統計から導出される新しい注意機構である。ハードウェア効率のよいブロックワイズアルゴリズムであるFlashLLAを導入し、現代のアクセラレータ上でスケーラブルで並列な計算を可能にする。実験の結果,LLAは非定常性に効果的に適応し,テスト時間トレーニングやコンテキスト内学習において強いベースラインを達成できることがわかった。
論文参考訳（メタデータ） (2025-10-01T20:42:21Z)
Beyond Softmax: A Natural Parameterization for Categorical Random Variables [61.709831225296305]
階層的なバイナリ分割のシーケンスで構成される関数である$textitcatnat$関数を紹介します。実験により,提案した関数は学習効率を向上し,一貫した試験性能を特徴とするモデルが得られることを示した。
論文参考訳（メタデータ） (2025-09-29T12:55:50Z)
Inverse Reinforcement Learning Using Just Classification and a Few Regressions [38.71913609455455]
逆強化学習は、基礎となる報酬を明らかにすることによって観察された振る舞いを説明することを目的としている。集団最大化解は, 行動方針を含む線形不動点方程式によって特徴づけられることを示す。最適解法,一般オラクルアルゴリズム,有限サンプル誤差境界,およびMaxEnt IRLに対する競合的あるいは優れた性能を示す実験結果の正確な評価を行う。
論文参考訳（メタデータ） (2025-09-25T13:53:43Z)
Near-Optimal Solutions of Constrained Learning Problems [85.48853063302764]
機械学習システムでは、振る舞いを縮小する必要性がますます顕在化している。これは、双対ロバスト性変数を満たすモデルの開発に向けた最近の進歩によって証明されている。この結果から, 豊富なパラメトリゼーションは非次元的, 有限な学習問題を効果的に緩和することが示された。
論文参考訳（メタデータ） (2024-03-18T14:55:45Z)
Resilient Constrained Learning [94.27081585149836]
本稿では,学習課題を同時に解決しながら,要求に適応する制約付き学習手法を提案する。我々はこの手法を、その操作を変更することで破壊に適応する生態システムを記述する用語に因んで、レジリエントな制約付き学習と呼ぶ。
論文参考訳（メタデータ） (2023-06-04T18:14:18Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。