Fugu-MT 論文翻訳(概要): MOSAIC: A Multilingual, Taxonomy-Agnostic, and Computationally Efficient Approach for Radiological Report Classification

論文の概要: MOSAIC: A Multilingual, Taxonomy-Agnostic, and Computationally Efficient Approach for Radiological Report Classification

arxiv url: http://arxiv.org/abs/2509.04471v1
Date: Fri, 29 Aug 2025 14:35:00 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-08 14:27:25.307228
Title: MOSAIC: A Multilingual, Taxonomy-Agnostic, and Computationally Efficient Approach for Radiological Report Classification
Title（参考訳）: MOSAIC : 多言語・分類・不可知的・計算学的手法による放射線学的報告分類
Authors: Alice Schiavone, Marco Fraccaro, Lea Marie Pehrson, Silvia Ingala, Rasmus Bonnevie, Michael Bachmann Nielsen, Vincent Beliveau, Melanie Ganz, Desmond Elliott,
Abstract要約: 放射線学的報告分類のための多言語・分類学的・計算学的アプローチであるMOSAICを紹介する。コンパクトなオープンアクセス言語モデル(MedGemma-4B)をベースとして、MOSAICはゼロプロンプト/フェーショットファインタニングと軽量ファインタニングの両方をサポートしている。我々はMOSAICを、英語、スペイン語、フランス語、デンマーク語で7つのデータセットにまたがって評価し、複数の画像のモダリティとラベルのモダリティにまたがる。
参考スコア（独自算出の注目度）: 8.266250751994187
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Radiology reports contain rich clinical information that can be used to train imaging models without relying on costly manual annotation. However, existing approaches face critical limitations: rule-based methods struggle with linguistic variability, supervised models require large annotated datasets, and recent LLM-based systems depend on closed-source or resource-intensive models that are unsuitable for clinical use. Moreover, current solutions are largely restricted to English and single-modality, single-taxonomy datasets. We introduce MOSAIC, a multilingual, taxonomy-agnostic, and computationally efficient approach for radiological report classification. Built on a compact open-access language model (MedGemma-4B), MOSAIC supports both zero-/few-shot prompting and lightweight fine-tuning, enabling deployment on consumer-grade GPUs. We evaluate MOSAIC across seven datasets in English, Spanish, French, and Danish, spanning multiple imaging modalities and label taxonomies. The model achieves a mean macro F1 score of 88 across five chest X-ray datasets, approaching or exceeding expert-level performance, while requiring only 24 GB of GPU memory. With data augmentation, as few as 80 annotated samples are sufficient to reach a weighted F1 score of 82 on Danish reports, compared to 86 with the full 1600-sample training set. MOSAIC offers a practical alternative to large or proprietary LLMs in clinical settings. Code and models are open-source. We invite the community to evaluate and extend MOSAIC on new languages, taxonomies, and modalities.
Abstract（参考訳）: 放射線医学報告には、高価な手動アノテーションに頼ることなく、画像モデルのトレーニングに使用できる豊富な臨床情報が含まれている。しかし、既存のアプローチは重要な制限に直面している: ルールベースの手法は言語的変動に苦しむ、教師付きモデルは大きな注釈付きデータセットを必要とする、そして最近のLCMベースのシステムは、臨床利用には適さないクローズドソースまたはリソース集約モデルに依存している。さらに、現在のソリューションは、主に英語と単一モダリティ、単一タコノミーデータセットに制限されている。放射線学的報告分類のための多言語・分類学的・計算学的アプローチであるMOSAICを紹介する。コンパクトなオープンアクセス言語モデル(MedGemma-4B)に基づいて構築されたMOSAICは、ゼロショットプロンプトと軽量な微調整の両方をサポートし、コンシューマグレードのGPUへのデプロイを可能にする。我々は、MOSAICを英語、スペイン語、フランス語、デンマーク語で7つのデータセットにまたがって評価し、複数の画像モダリティとラベル分類を網羅した。このモデルは、5つの胸部X線データセットの平均マクロF1スコア88を達成し、専門家レベルのパフォーマンスに近づいたり超えたりしながら、わずか24GBのGPUメモリしか必要としない。データ拡張では、デンマークのレポートの重み付けされたF1スコアが82に到達するのに、80個の注釈付きサンプルが十分である。 MOSAICは、臨床環境では、大型またはプロプライエタリなLSMに代わる実用的な代替手段を提供する。コードとモデルはオープンソースである。我々は、新しい言語、分類学、モダリティに関するMOSAICの評価と拡張をコミュニティに依頼する。

論文の概要: MOSAIC: A Multilingual, Taxonomy-Agnostic, and Computationally Efficient Approach for Radiological Report Classification

関連論文リスト