Fugu-MT 論文翻訳(概要): DPBench: Large Language Models Struggle with Simultaneous Coordination

論文の概要: DPBench: Large Language Models Struggle with Simultaneous Coordination

arxiv url: http://arxiv.org/abs/2602.13255v1
Date: Mon, 02 Feb 2026 18:26:00 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-23 12:01:13.590403
Title: DPBench: Large Language Models Struggle with Simultaneous Coordination
Title（参考訳）: DPBench: 大規模言語モデルと同時コーディネーション
Authors: Najmul Hasan, Prashanth BusiReddyGari,
Abstract要約: DPBenchは、決定タイミング、グループサイズ、コミュニケーションの異なる8つの条件の調整を評価するベンチマークである。 GPT-5.2、Claude Opus 4.5、Grok 4.1による実験では、顕著な非対称性が明らかとなった。本研究は, 並列資源アクセスを必要とするマルチエージェントLLMシステムにおいて, 創発的調整に頼るのではなく, 外部調整機構が必要であることを示唆する。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models are increasingly deployed in multi-agent systems, yet we lack benchmarks that test whether they can coordinate under resource contention. We introduce DPBench, a benchmark based on the Dining Philosophers problem that evaluates LLM coordination across eight conditions that vary decision timing, group size, and communication. Our experiments with GPT-5.2, Claude Opus 4.5, and Grok 4.1 reveal a striking asymmetry: LLMs coordinate effectively in sequential settings but fail when decisions must be made simultaneously, with deadlock rates exceeding 95\% under some conditions. We trace this failure to convergent reasoning, where agents independently arrive at identical strategies that, when executed simultaneously, guarantee deadlock. Contrary to expectations, enabling communication does not resolve this problem and can even increase deadlock rates. Our findings suggest that multi-agent LLM systems requiring concurrent resource access may need external coordination mechanisms rather than relying on emergent coordination. DPBench is released as an open-source benchmark. Code and benchmark are available at https://github.com/najmulhasan-code/dpbench.
Abstract（参考訳）: 大規模言語モデルは、ますますマルチエージェントシステムにデプロイされているが、リソース競合の下で調整できるかどうかをテストするベンチマークは欠如している。 DPBenchは、決定タイミング、グループサイズ、コミュニケーションの異なる8つの条件でLCM調整を評価するダイニング・フィロソワーズ問題に基づくベンチマークである。 GPT-5.2、Claude Opus 4.5、Grok 4.1による実験では、顕著な非対称性が示された。エージェントが独立して同じ戦略に到達し、同時に実行されるとデッドロックが保証されます。期待とは対照的に、コミュニケーションの有効化はこの問題を解決せず、デッドロック率を高めることもできる。本研究は, 並列資源アクセスを必要とするマルチエージェントLLMシステムにおいて, 創発的調整に頼るのではなく, 外部調整機構が必要であることを示唆する。 DPBenchはオープンソースベンチマークとしてリリースされた。コードとベンチマークはhttps://github.com/najmulhasan-code/dpbench.comで公開されている。

論文の概要: DPBench: Large Language Models Struggle with Simultaneous Coordination

関連論文リスト