Fugu-MT 論文翻訳(概要): CarePilot: A Multi-Agent Framework for Long-Horizon Computer Task Automation in Healthcare

論文の概要: CarePilot: A Multi-Agent Framework for Long-Horizon Computer Task Automation in Healthcare

arxiv url: http://arxiv.org/abs/2603.24157v1
Date: Wed, 25 Mar 2026 10:25:48 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-26 21:06:11.246428
Title: CarePilot: A Multi-Agent Framework for Long-Horizon Computer Task Automation in Healthcare
Title（参考訳）: CarePilot:医療における長距離コンピュータタスク自動化のためのマルチエージェントフレームワーク
Authors: Akash Ghosh, Tajamul Ashraf, Rishu Kumar Singh, Numan Saeed, Sriparna Saha, Xiuying Chen, Salman Khan,
Abstract要約: マルチモーダルエージェントパイプラインは、複雑で現実的なタスクの効率的でアクセスしやすい自動化を可能にすることによって、人間とコンピュータのインタラクションを変革している。近年の取り組みは、短期的、あるいは汎用的なアプリケーションに重点を置いており、特に医療において、ドメイン固有のシステムに対する長期的自動化は、ほとんど探索されていない。本稿では,アクター批判パラダイムに基づくマルチエージェントフレームワークであるCarePilotを紹介する。実験の結果,CarePilotは最先端のパフォーマンスを達成し,クローズドソースとオープンソースのマルチモーダルベースラインをそれぞれ約15.26%,3.38%向上した。
参考スコア（独自算出の注目度）: 37.42599407869901
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multimodal agentic pipelines are transforming human-computer interaction by enabling efficient and accessible automation of complex, real-world tasks. However, recent efforts have focused on short-horizon or general-purpose applications (e.g., mobile or desktop interfaces), leaving long-horizon automation for domain-specific systems, particularly in healthcare, largely unexplored. To address this, we introduce CareFlow, a high-quality human-annotated benchmark comprising complex, long-horizon software workflows across medical annotation tools, DICOM viewers, EHR systems, and laboratory information systems. On this benchmark, existing vision-language models (VLMs) perform poorly, struggling with long-horizon reasoning and multi-step interactions in medical contexts. To overcome this, we propose CarePilot, a multi-agent framework based on the actor-critic paradigm. The Actor integrates tool grounding with dual-memory mechanisms (long-term and short-term experience) to predict the next semantic action from the visual interface and system state. The Critic evaluates each action, updates memory based on observed effects, and either executes or provides corrective feedback to refine the workflow. Through iterative agentic simulation, the Actor learns to perform more robust and reasoning-aware predictions during inference. Our experiments show that CarePilot achieves state-of-the-art performance, outperforming strong closed-source and open-source multimodal baselines by approximately 15.26% and 3.38%, respectively, on our benchmark and out-of-distribution dataset.
Abstract（参考訳）: マルチモーダルエージェントパイプラインは、複雑で現実的なタスクの効率的でアクセスしやすい自動化を可能にすることによって、人間とコンピュータのインタラクションを変革している。しかし、最近の取り組みは、短期的、あるいは汎用的なアプリケーション(例えば、モバイルやデスクトップのインターフェイス)に焦点を当てており、特に医療において、ドメイン固有のシステムに対する長期的自動化は、ほとんど探索されていない。この問題を解決するために、医療アノテーションツール、DICOMビューア、EHRシステム、実験室情報システムにまたがる複雑な長期ソフトウェアワークフローからなる高品質な人間アノテーションベンチマークであるCareFlowを紹介した。このベンチマークでは、既存の視覚言語モデル(VLM)は、医学的文脈における長期の推論と多段階の相互作用に苦慮している。これを解決するために,アクター批判パラダイムに基づくマルチエージェントフレームワークであるCarePilotを提案する。 Actorはツールグラウンドをデュアルメモリ機構(長期および短期の経験)と統合し、ビジュアルインターフェースとシステム状態から次のセマンティックアクションを予測する。批評家はそれぞれのアクションを評価し、観察された効果に基づいてメモリを更新し、ワークフローを洗練させるために実行または修正的なフィードバックを提供する。反復的なエージェントシミュレーションを通じて、アクターは推論中により堅牢で推論可能な予測を実行することを学ぶ。実験の結果、CarePilotは最先端のパフォーマンスを達成し、ベンチマークとアウト・オブ・ディストリビューションデータセットで、強力なクローズドソースベースラインとオープンソースのマルチモーダルベースラインをそれぞれ約15.26%、そして3.38%上回った。

論文の概要: CarePilot: A Multi-Agent Framework for Long-Horizon Computer Task Automation in Healthcare

関連論文リスト