Fugu-MT 論文翻訳(概要): Health System Scale Semantic Search Across Unstructured Clinical Notes

論文の概要: Health System Scale Semantic Search Across Unstructured Clinical Notes

arxiv url: http://arxiv.org/abs/2604.25605v1
Date: Tue, 28 Apr 2026 13:09:48 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-29 16:49:17.868078
Title: Health System Scale Semantic Search Across Unstructured Clinical Notes
Title（参考訳）: 非構造的臨床ノートにおける健康システムスケールのセマンティックサーチ
Authors: Faith Wavinya Mutinda, Spandana Makeneni, Anna Lin, Shivaji Dutta, Irit R. Rasooly, Patrick Dibussolo, Shivani Kamath Belman, Hessam Shahriari, Kevin Murphy, Alex B. Ruan, Barbara H. Chaiyachati, Sanjay Chainani, Robert W. Grundmeier, Scott M. Haag, Jeffrey M. Miller, Heather M. Griffis, Ian M. Campbell,
Abstract要約: 我々は168万人の患者から1億6600万の臨床ノートを索引付けする意味検索システムを大小児病院に展開する。このシステムは、サブレイテンシークエリのレイテンシ(現在237msのシングルユーザ、451msの20ユーザ)を毎月約4,000米ドルのコストで提供する。
参考スコア（独自算出の注目度）: 1.599023522858371
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Introduction: Semantic search, which retrieves documents based on conceptual similarity rather than keyword matching, offers substantial advantages for retrieval of clinical information. However, deploying semantic search across entire health systems, comprising hundreds of millions of clinical notes, presents formidable engineering, cost, and governance challenges that have prevented adoption. Methods: We deployed a semantic search system at a large children's hospital indexing 166 million clinical notes (484 million vectors) from 1.68 million patients. The system uses instruction-tuned qwen3-embedding-0.6B embeddings, stores vectors in a managed database with storage-optimized indexing, maintains full-text metadata in a low-latency key-value store, and operates within a HIPAA-compliant governance framework. We evaluated the system through three experiments: optimization of embedding model and chunking strategy using a physician-authored benchmark dataset, characterization of full-scale performance (cost, latency, retrieval quality), and clinical utility assessment via comparison of chart abstraction efficiency across three tasks. Results: The system delivers sub-second query latency (median 237 ms single-user, 451 ms 20-user concurrency) with monthly costs of approximately USD 4,000. Qwen3 embeddings with 300-token chunk size achieved 94.6% accuracy on a clinical question-answering benchmark. In clinical utility evaluation across three abstraction tasks, semantic search reduced time-to-completion by 24 to 89% compared to clinician-performed chart review while maintaining comparable inter-rater agreement. Conclusion: Health-system-scale semantic search is both technically and operationally feasible. The system provides infrastructure supporting interactive search, cohort generation, and downstream LLM-powered clinical applications without requiring specialized informatics expertise.
Abstract（参考訳）: 紹介:セマンティック検索は,キーワードマッチングではなく概念的類似性に基づいて文書を検索する。しかし、医療システム全体にわたってセマンティックサーチを展開し、何十億もの臨床論文をまとめ、導入を妨げた厳しいエンジニアリング、コスト、ガバナンスの課題を提示する。方法: 重度小児病院に意味検索システムを導入し, 166万名(ベクター4400万件)を168万名から検索した。このシステムは命令調整されたqwen3-embedding-0.6B埋め込みを使用し、ストレージ最適化インデックス付きマネージドデータベースにベクターを格納し、低レイテンシのキーバリューストアでフルテキストメタデータを保持し、HIPAA準拠のガバナンスフレームワーク内で動作する。筆者らは,本システムについて,医師が作成したベンチマークデータセットを用いた埋め込みモデルとチャンキング戦略の最適化,コスト,レイテンシ,検索品質などの実測値,および3つのタスク間のチャート抽象効率の比較による臨床ユーティリティ評価の3つの実験により評価した。結果: このシステムは,サブ秒以下のクエリレイテンシ(シングルユーザ237ms,20ユーザ同時実行451ms)を,約4,000米ドルの月額費用で提供する。 Qwen3埋め込みは300トンのチャンクサイズで94.6%の精度を実現した。セマンティックサーチは,3つの抽象的タスクの臨床的有用性評価において,同等のラター間合意を維持しながら,クリニカル・パフォーマンス・チャートレビューと比較して24～89%短縮した。結論: 医療システム規模のセマンティックサーチは技術的にも操作的にも実現可能である。このシステムは、専門的な情報学の専門知識を必要とせず、インタラクティブ検索、コホート生成、下流のLSMによる臨床応用をサポートするインフラを提供する。

論文の概要: Health System Scale Semantic Search Across Unstructured Clinical Notes

関連論文リスト