Fugu-MT 論文翻訳(概要): GraphRAG on Consumer Hardware: Benchmarking Local LLMs for Healthcare EHR Schema Retrieval

論文の概要: GraphRAG on Consumer Hardware: Benchmarking Local LLMs for Healthcare EHR Schema Retrieval

arxiv url: http://arxiv.org/abs/2605.20815v1
Date: Wed, 20 May 2026 07:09:53 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-21 19:19:56.544545
Title: GraphRAG on Consumer Hardware: Benchmarking Local LLMs for Healthcare EHR Schema Retrieval
Title（参考訳）: GraphRAG on Consumer Hardware:Benchmarking Local LLMs for Healthcare EHR Schema Retrieval
Authors: Peter Fernandes, Ria Kanjilal,
Abstract要約: グラフベースの検索拡張生成(GraphRAG)は、検索拡張生成を拡張し、複雑なコーパス上の構造化推論をサポートする。ローカルにデプロイされたオープンソースの大言語モデル(LLM)を用いたEHRスキーマ検索のためのGraphRAGの体系的評価を行う。 Llama 3.1 が最もリッチな知識グラフ (1,172 個のエンティティ)、Qwen 2.5 は最高の回答品質 (3.3/5)、Phi-4-mini は構造化出力エラーのためにパイプラインを完了しない。
参考スコア（独自算出の注目度）: 2.9215909234122672
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Graph-based Retrieval Augmented Generation (GraphRAG) extends retrieval-augmented generation to support structured reasoning over complex corpora, but its reliability under resource-constrained, privacy-sensitive deployments remains unclear. In healthcare, where Electronic Health Record (EHR) data is complex and strictly regulated, reliance on cloud-based large language models (LLMs) introduces challenges in cost, latency, and compliance. In this work, we present a systematic evaluation of GraphRAG for EHR schema retrieval using locally deployed open-source LLMs. We implement the Microsoft GraphRAG pipeline on real-world EHR schema documentation and benchmark four models, including Llama 3.1 (8B), Mistral (7B), Qwen 2.5 (7B), and Phi-4-mini (3.8B), each deployed via Ollama on a single consumer GPU (8 GB VRAM). We evaluate indexing efficiency, knowledge graph construction, query latency, answer quality, and hallucination under both global and local retrieval modes. Our results reveal substantial differences: Llama 3.1 produces the richest knowledge graph (1,172 entities), Qwen 2.5 achieves the best answer quality (3.3/5), Phi-4-mini fails to complete the pipeline due to structured-output errors, and Mistral exhibits degenerate repetition behavior. We further show that GraphRAG exhibits a practical capacity threshold, where models below approximately 7B parameters fail to reliably produce valid structured outputs and cannot complete the pipeline. In addition, indexing and answer quality are decoupled across models, and local retrieval consistently outperforms global summarization in both latency and factual grounding, with reduced hallucination. These findings demonstrate that GraphRAG is feasible on consumer hardware while highlighting the importance of model selection and retrieval design for robust deployment in regulated settings.
Abstract（参考訳）: グラフベースのRetrieval Augmented Generation(GraphRAG)は、検索拡張生成を拡張して、複雑なコーパス上の構造化推論をサポートする。医療において、Electronic Health Record(EHR)データは複雑で厳格に規制されているが、クラウドベースの大規模言語モデル(LLM)に依存しているため、コスト、レイテンシ、コンプライアンスの課題が発生する。本研究では,ローカルにデプロイされたオープンソースLLMを用いたEHRスキーマ検索のためのGraphRAGの体系的評価を行う。 Llama 3.1 (8B)、Mistral (7B)、Qwen 2.5 (7B)、Phi-4-mini (3.8B)の4つのモデルでMicrosoft GraphRAGパイプラインを実装しています。我々は,グローバル検索モードとローカル検索モードの両方において,インデックス作成効率,知識グラフ構築,クエリ待ち時間,回答品質,幻覚を評価する。 Llama 3.1 が最も豊富な知識グラフ (1,172 個のエンティティ)、Qwen 2.5 は最高の回答品質 (3.3/5)、Phi-4-mini は構造化出力エラーによりパイプラインを完了せず、Mistral は縮退を繰り返している。さらに、GraphRAGは、約7Bパラメータ以下のモデルでは、有効な構造化出力を確実に生成できず、パイプラインを完了できない、実用的なキャパシティしきい値を示す。さらに、索引付けと回答の品質はモデル間で分離され、局所的な検索は、レイテンシーと事実的グラウンドの両方において、常にグローバルな要約よりも優れ、幻覚は減少する。これらの結果は,GraphRAGがコンシューマハードウェア上で実現可能であることを示し,モデル選択と検索設計の重要性を強調した。

論文の概要: GraphRAG on Consumer Hardware: Benchmarking Local LLMs for Healthcare EHR Schema Retrieval

関連論文リスト