Fugu-MT 論文翻訳(概要): AI Playing Business Games: Benchmarking Large Language Models on Managerial Decision-Making in Dynamic Simulations

論文の概要: AI Playing Business Games: Benchmarking Large Language Models on Managerial Decision-Making in Dynamic Simulations

arxiv url: http://arxiv.org/abs/2509.26331v1
Date: Tue, 30 Sep 2025 14:43:05 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-01 14:45:00.16974
Title: AI Playing Business Games: Benchmarking Large Language Models on Managerial Decision-Making in Dynamic Simulations
Title（参考訳）: AIのビジネスゲームプレイ - 動的シミュレーションにおける管理的意思決定に基づく大規模言語モデルのベンチマーク
Authors: Berdymyrat Ovezmyradov,
Abstract要約: 本研究は,ビジネスにおける意思決定にビジネスゲームを用いた新しいベンチマークを解析する。この研究は、再現可能なオープンアクセス管理シミュレータを提案することで、AIに関する最近の文献に貢献する。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The rapid advancement of LLMs sparked significant interest in their potential to augment or automate managerial functions. One of the most recent trends in AI benchmarking is performance of Large Language Models (LLMs) over longer time horizons. While LLMs excel at tasks involving natural language and pattern recognition, their capabilities in multi-step, strategic business decision-making remain largely unexplored. Few studies demonstrated how results can be different from benchmarks in short-term tasks, as Vending-Bench revealed. Meanwhile, there is a shortage of alternative benchmarks for long-term coherence. This research analyses a novel benchmark using a business game for the decision making in business. The research contributes to the recent literature on AI by proposing a reproducible, open-access management simulator to the research community for LLM benchmarking. This novel framework is used for evaluating the performance of five leading LLMs available in free online interface: Gemini, ChatGPT, Meta AI, Mistral AI, and Grok. LLM makes decisions for a simulated retail company. A dynamic, month-by-month management simulation provides transparently in spreadsheet model as experimental environment. In each of twelve months, the LLMs are provided with a structured prompt containing a full business report from the previous period and are tasked with making key strategic decisions: pricing, order size, marketing budget, hiring, dismissal, loans, training expense, R&D expense, sales forecast, income forecast The methodology is designed to compare the LLMs on quantitative metrics: profit, revenue, and market share, and other KPIs. LLM decisions are analyzed in their strategic coherence, adaptability to market changes, and the rationale provided for their decisions. This approach allows to move beyond simple performance metrics for assessment of the long-term decision-making.
Abstract（参考訳）: LLMの急速な進歩は、管理機能を増強または自動化する可能性に大きな関心を喚起した。 AIベンチマークにおける最新のトレンドの1つは、長期にわたってのLarge Language Models(LLMs)のパフォーマンスである。 LLMは自然言語やパターン認識に関わるタスクに優れていますが、その多段階における能力は、戦略的ビジネス上の意思決定において、ほとんど探索されていないままです。 Vending-Bench氏が明らかにしたように、短期的なタスクにおけるベンチマークと結果がどう違うかを示す研究はほとんどない。一方、長期的なコヒーレンスのための代替ベンチマークは不足している。本研究は,ビジネスにおける意思決定にビジネスゲームを用いた新しいベンチマークを解析する。この研究は、LLMベンチマークの研究コミュニティに再現可能なオープンアクセス管理シミュレータを提案することで、AIに関する最近の文献に貢献している。この新しいフレームワークは、Gemini、ChatGPT、Meta AI、Mistral AI、Grokの5つの主要なLLMのパフォーマンスを評価するために使用される。 LLMは、シミュレートされた小売企業の決定を下す。動的かつ月ごとの管理シミュレーションは,実験環境としてスプレッドシートモデルを透過的に提供する。 12ヶ月ごとに、LLMは、前回の完全な事業報告を含む構造化されたプロンプトが提供され、価格、注文規模、マーケティング予算、雇用、解雇、ローン、訓練費、研究開発費、売上予測、所得予測といった重要な戦略決定を行う。 LLMの決定は、戦略的な一貫性、市場の変化への適応性、そして彼らの決定に与えられた根拠で分析される。このアプローチは、長期的な意思決定を評価するための単純なパフォーマンスメトリクスを超えることができる。

論文の概要: AI Playing Business Games: Benchmarking Large Language Models on Managerial Decision-Making in Dynamic Simulations

関連論文リスト