Fugu-MT 論文翻訳(概要): LoGSAM: Parameter-Efficient Cross-Modal Grounding for MRI Segmentation

論文の概要: LoGSAM: Parameter-Efficient Cross-Modal Grounding for MRI Segmentation

arxiv url: http://arxiv.org/abs/2603.17576v1
Date: Wed, 18 Mar 2026 10:33:32 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-19 18:32:57.647345
Title: LoGSAM: Parameter-Efficient Cross-Modal Grounding for MRI Segmentation
Title（参考訳）: LoGSAM:MRIセグメンテーションのためのパラメータ効率の良いクロスモーダルグラウンド
Authors: Mohammad Robaitul Islam Bhuiyan, Sheethal Bhat, Melika Qahqaie, Tri-Thien Nguyen, Paula Andrea Pérez Toro, Tomas Arias Vergara, Andreas Maier,
Abstract要約: 基礎モデルに基づくローカライゼーションとセグメンテーションのためのテキストプロンプトに変換するパラメータ効率のフレームワークであるLoGSAMを提案する。これらのプロンプトは、LoRA適応型視覚言語検出モデルであるGrounding DINOを介して、テキスト条件の腫瘍局在を誘導する。健診12例のMRI画像に対して, ドイツの放射線技師の指示による全パイプラインの評価を行い, 症例レベルの精度91.7%を得た。
参考スコア（独自算出の注目度）: 5.967422994926725
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Precise localization and delineation of brain tumors using Magnetic Resonance Imaging (MRI) are essential for planning therapy and guiding surgical decisions. However, most existing approaches rely on task-specific supervised models and are constrained by the limited availability of annotated data. To address this, we propose LoGSAM, a parameter-efficient, detection-driven framework that transforms radiologist dictation into text prompts for foundation-model-based localization and segmentation. Radiologist speech is first transcribed and translated using a pretrained Whisper ASR model, followed by negation-aware clinical NLP to extract tumor-specific textual prompts. These prompts guide text-conditioned tumor localization via a LoRA-adapted vision-language detection model, Grounding DINO (GDINO). The LoRA adaptation updates using 5% of the model parameters, thereby enabling computationally efficient domain adaptation while preserving pretrained cross-modal knowledge. The predicted bounding boxes are used as prompts for MedSAM to generate pixel-level tumor masks without any additional fine-tuning. Conditioning the frozen MedSAM on LoGSAM-derived priors yields a state-of-the-art dice score of 80.32% on BRISC 2025. In addition, we evaluate the full pipeline using German dictations from a board-certified radiologist on 12 unseen MRI scans, achieving 91.7% case-level accuracy. These results highlight the feasibility of constructing a modular, speech-to-segmentation pipeline by intelligently leveraging pretrained foundation models with minimal parameter updates.
Abstract（参考訳）: 磁気共鳴イメージング(MRI)を用いた脳腫瘍の正確な局所化とデライン化は,外科的治療の計画と指導に不可欠である。しかし、既存のほとんどのアプローチはタスク固有の教師付きモデルに依存しており、注釈付きデータの可用性に制限されている。そこで本稿では, 基礎モデルに基づくローカライゼーションとセグメンテーションのためのテキストプロンプトにラジオロジカル予測を変換する, パラメータ効率, 検出駆動型フレームワークであるLoGSAMを提案する。放射線医学的音声は、まず事前訓練されたWhisper ASRモデルを用いて書き起こされ翻訳され、その後、腫瘍特異的なテキストプロンプトを抽出する否定型臨床NLPが続く。これらのプロンプトは、LoRA適応視覚言語検出モデルであるGunding DINO(GDINO)を介して、テキスト条件の腫瘍局在を誘導する。モデルパラメータの5%を使用してLoRA適応を更新することにより、事前訓練されたクロスモーダル知識を保持しながら、計算効率の良いドメイン適応を可能にする。予測されたバウンディングボックスは、MedSAMが追加の微調整なしでピクセルレベルの腫瘍マスクを生成するプロンプトとして使用される。 LoGSAM由来の前駆体に凍結したメドSAMを条件にすると、BRISC 2025では最先端のサイコロスコアが80.32%となる。また,12例のMRI検査において,検診医の独知による全パイプラインの評価を行い,91.7%の検診精度が得られた。これらの結果は,事前学習した基礎モデルを最小限のパラメータ更新でインテリジェントに活用することにより,モジュール型音声合成パイプラインの構築の可能性を強調した。

論文の概要: LoGSAM: Parameter-Efficient Cross-Modal Grounding for MRI Segmentation

関連論文リスト