Fugu-MT 論文翻訳(概要): Towards Conversational Medical AI with Eyes, Ears and a Voice

論文の概要: Towards Conversational Medical AI with Eyes, Ears and a Voice

arxiv url: http://arxiv.org/abs/2605.09272v1
Date: Sun, 10 May 2026 02:43:49 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-12 23:28:50.157047
Title: Towards Conversational Medical AI with Eyes, Ears and a Voice
Title（参考訳）: 目・耳・声による会話型医療AIを目指して
Authors: Meet Shah, Jason Gusdorf, Anil Palepu, Chunjong Park, Jack W. O'Sullivan, Vishnu Ravi, Tim Strother, Pavel Dubov, Aliya Rysbek, Toshiyuki Fukuzawa, Yana Lunts, Jan Freyberg, Michael B. Chang, Aniruddh Raghu, David Stutz, Devora Berlowitz, Eliseo Papa, Taylan Cemgil, JD Velasquez, Jack Chen, Arthur Chen, Doug Fritz, Charlie Taylor, Katya Tregubova, Jing Rong Lim, Richard Green, Sara Mahdavi, Mahvish Nagda, Jihyeon Lee, Craig Schiff, Liviu Panait, Sukhdeep Singh, Valentin Liévin, David G. T. Barrett, Hannah Gladman, Anna Cupani, Francesca Pietra, Uchechi Okereke, Katherine Tong, Clemens Meyer, Erwan Rolland, Mili Sanwalka, Michael D. Howell, Shixiang Shane Gu, Bibo Xu, Euan A. Ashley, S. M. Ali Eslami, Gregory Wayne, Pushmeet Kohli, Vivek Natarajan, Adam Rodman, Alan Karthikesalingam, Ryutaro Tanno,
Abstract要約: 我々は,対話型AIシステムであるAIコクリニシアンを紹介する。その二重エージェントアーキテクチャは、自然な対話に必要な低レイテンシと深い臨床推論のバランスをとる。我々の研究は、テキストのみのアプローチが、医療相談の真の課題を捉えるのに失敗していることを示している。
参考スコア（独自算出の注目度）: 19.359997612335018
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The practice of medicine relies not only upon skillful dialogue but also on the nuanced exchange and interpretation of rich auditory and visual cues between doctors and patients. Building on the low-latency voice and video processing capabilities of Gemini, we introduce AI co-clinician, a first-of-its-kind conversational AI system utilizing continuous streams of audio-visual data from live patient conversations to inform real-time clinical decisions. Its dual-agent architecture balances deep clinical reasoning with the low latency required for natural dialogue. To assess this system, we implemented a video-based interface emulating telemedicine consultations. We crafted 20 standardized outpatient scenarios requiring proactive real-time auditory and visual reasoning and designed "TelePACES" evaluation criteria alongside case-specific rubrics. In a randomized, interface-blinded, crossover simulation study (n = 120 encounters) with 10 internal medicine residents as patient actors, we compared AI co-clinician with primary care physicians (PCPs), GPT-Realtime, and a baseline agent. AI co-clinician approached PCPs in key TelePACES dimensions, including management plans and differential diagnosis, while significantly outperforming GPT-Realtime across all general criteria. While our agent demonstrated parity with PCPs in case-specific triage measures, physicians maintained superior overall performance in case-specific assessments. Although AI co-clinician marks a significant advance in real-time telemedical AI, gaps remain in physical examination and disease-specific reasoning. Our work shows that text-only approaches fail to capture the true challenges of medical consultation and suggests that high-stakes real-time diagnostic AI is most safely advanced in collaborative, triadic models where AI can be a supportive co-clinician for doctors and patients.
Abstract（参考訳）: 医学の実践は、熟練した対話だけでなく、医師と患者の豊かな聴覚と視覚的手がかりの微妙な交換と解釈にも依存している。 Geminiの低レイテンシ音声およびビデオ処理機能に基づいて、我々はAIコクリニシアン(co-clinician)を紹介します。その二重エージェントアーキテクチャは、自然な対話に必要な低レイテンシと深い臨床推論のバランスをとる。このシステムを評価するために,遠隔医療相談をエミュレートするビデオベースインタフェースを実装した。本研究は, リアルタイム聴覚と視覚的推論を必要とする外来患者20名を対象に, 症例別ルーリックとともに「テレPACES」評価基準を策定した。患者10名の内科医と無作為・インターフェース・ブラインド・クロスオーバー・シミュレーション(n = 120)において,AIコクリニシアンとプライマリ・ケア・ドクター(PCP),GPT-Realtime,ベースライン・エージェントを比較した。 AIのコクリニシアン(co-clinician)は、管理計画やディファレンシャル診断を含む主要なTelePACES領域のPCPにアプローチし、すべての一般的な基準でGPT-Realtimeを著しく上回った。症例特異的トリアージ尺度ではPCPと同等であったが,医師は症例特異的評価では総合成績が良好であった。 AIコクリニシアン(英語版)はリアルタイム遠隔医療AIにおいて大きな進歩を見せているが、物理的な検査と疾患固有の推論にはギャップが残っている。我々の研究は、テキストのみのアプローチが医療相談の真の課題を捉えることができず、医師や患者にとってAIが支援的なコクリニシアンとなるような、協力的で三進的なモデルにおいて、ハイリスクなリアルタイム診断AIが最も安全に進歩していることを示唆している。

論文の概要: Towards Conversational Medical AI with Eyes, Ears and a Voice

関連論文リスト