Fugu-MT 論文翻訳(概要): HiveMind: OS-Inspired Scheduling for Concurrent LLM Agent Workloads

論文の概要: HiveMind: OS-Inspired Scheduling for Concurrent LLM Agent Workloads

arxiv url: http://arxiv.org/abs/2604.17111v1
Date: Sat, 18 Apr 2026 18:59:33 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-21 21:52:52.338714
Title: HiveMind: OS-Inspired Scheduling for Concurrent LLM Agent Workloads
Title（参考訳）: HiveMind: 並行LLMエージェントワークロードのためのOSインスパイアされたスケジューリング
Authors: Justice Owusu Agyemang, Jerry John Kponyo, Obed Kwasi Somuah, Elliot Amponsah, Godfred Manu Addo Boakye, Kwame Opuni-Boachie Obour Agyekum,
Abstract要約: モチベーションのインシデントでは、11の並列エージェントのうち3つがコネクションリセットとHTTP 502エラーで死亡しました。 HIVEMINDは5つのOSにインスパイアされたスケジューリングプリミティブを適用し,非協調並列実行による障害モードを除去する透過的なHTTPプロキシである。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: When multiple LLM coding agents share a rate-limited API endpoint, they exhibit resource contention patterns analogous to unscheduled OS processes competing for CPU, memory, and I/O. In a motivating incident, 3 of 11 parallel agents died from connection resets and HTTP 502 errors - a 27% failure rate - despite the API having sufficient aggregate capacity to serve all 11 sequentially. We present HIVEMIND, a transparent HTTP proxy that applies five OS-inspired scheduling primitives - admission control, rate-limit tracking, AIMD backpressure with circuit breaking, token budget management, and priority queuing - to eliminate the failure modes caused by uncoordinated parallel execution. The proxy requires zero modifications to existing agent code and supports Anthropic, OpenAI, and local model APIs via auto-detected provider profiles. Our evaluation across seven scenarios (5-50 concurrent agents) shows that uncoordinated agents fail at 72-100% rates under contention, while HIVEMIND reduces failures to 0-18% and eliminates 48-100% of wasted compute. An ablation study reveals that transparent retry - not admission control - is the single most critical primitive, but the primitives are most effective in combination. Real-world validation against Ollama confirms that HIVEMIND adds under 3ms of proxy overhead per request. The system is open-source under the MIT license.
Abstract（参考訳）: 複数のLDMコーディングエージェントがレート制限APIエンドポイントを共有すると、CPU、メモリ、I/Oと競合する未スケジュールのOSプロセスに類似したリソース競合パターンを示す。モチベーションのあるインシデントでは、11の並列エージェントのうち3つが、コネクションリセットとHTTP 502エラー(障害率27%)で死亡した。入出力制御,レートリミットトラッキング,回路破壊を伴うAIMDバックプレッシャ,トークン予算管理,優先度キューなど,5つのOSにインスパイアされたスケジューリングプリミティブを適用した透過的なHTTPプロキシであるHIVEMINDを,非コーディネート並列実行による障害モードを排除する。プロキシは、既存のエージェントコードの変更をゼロにし、自動検出されたプロバイダプロファイルを介して、Anthropic、OpenAI、およびローカルモデルAPIをサポートする。コンカレントエージェントを7つのシナリオ(5-50)で評価したところ,非コーディネートエージェントは72-100%の速度で競合する一方,HIVEMINDはエラーを0-18%まで低減し,48-100%の無駄な計算を排除した。アブレーション研究では、入場制御ではなく透明な再試行が、最も重要なプリミティブであることが明らかになったが、プリミティブは組み合わせて最も効果的である。 Ollamaに対する現実の検証では、HIVEMINDがリクエスト毎に3ms以下のプロキシオーバーヘッドを追加することが確認されている。このシステムはMITライセンス下でオープンソースである。

論文の概要: HiveMind: OS-Inspired Scheduling for Concurrent LLM Agent Workloads

関連論文リスト