Fugu-MT 論文翻訳(概要): Multimodal Carotid Risk Stratification with Large Vision-Language Models: Benchmarking, Fine-Tuning, and Clinical Insights

論文の概要: Multimodal Carotid Risk Stratification with Large Vision-Language Models: Benchmarking, Fine-Tuning, and Clinical Insights

arxiv url: http://arxiv.org/abs/2510.02922v1
Date: Fri, 03 Oct 2025 11:48:12 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-06 16:35:52.371381
Title: Multimodal Carotid Risk Stratification with Large Vision-Language Models: Benchmarking, Fine-Tuning, and Clinical Insights
Title（参考訳）: 大規模視神経モデルを用いた多モーダル頸動脈リスク階層化 : ベンチマーク, 微調整, 臨床的考察
Authors: Daphne Tsolissou, Theofanis Ganitidis, Konstantinos Mitsis, Stergios CHristodoulidis, Maria Vakalopoulou, Konstantina Nikita,
Abstract要約: 本研究では,マルチモーダル頸動脈プラーク評価のための最先端および最近の大規模視覚言語モデル (LVLM) の可能性について検討した。インタビュースタイルの質問シーケンスを通じて現実的な診断シナリオをシミュレートするフレームワークを提案する。実験の結果、LVLMが非常に強力であるとしても、すべてのLVLMが画像のモダリティと解剖を正確に識別できるわけではないことが判明した。
参考スコア（独自算出の注目度）: 3.5469990240092373
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reliable risk assessment for carotid atheromatous disease remains a major clinical challenge, as it requires integrating diverse clinical and imaging information in a manner that is transparent and interpretable to clinicians. This study investigates the potential of state-of-the-art and recent large vision-language models (LVLMs) for multimodal carotid plaque assessment by integrating ultrasound imaging (USI) with structured clinical, demographic, laboratory, and protein biomarker data. A framework that simulates realistic diagnostic scenarios through interview-style question sequences is proposed, comparing a range of open-source LVLMs, including both general-purpose and medically tuned models. Zero-shot experiments reveal that even if they are very powerful, not all LVLMs can accurately identify imaging modality and anatomy, while all of them perform poorly in accurate risk classification. To address this limitation, LLaVa-NeXT-Vicuna is adapted to the ultrasound domain using low-rank adaptation (LoRA), resulting in substantial improvements in stroke risk stratification. The integration of multimodal tabular data in the form of text further enhances specificity and balanced accuracy, yielding competitive performance compared to prior convolutional neural network (CNN) baselines trained on the same dataset. Our findings highlight both the promise and limitations of LVLMs in ultrasound-based cardiovascular risk prediction, underscoring the importance of multimodal integration, model calibration, and domain adaptation for clinical translation.
Abstract（参考訳）: 頸動脈アテローマ性疾患に対する信頼性の高いリスクアセスメントは、様々な臨床・画像情報を臨床医に透過的かつ解釈可能な方法で統合する必要があるため、大きな臨床課題である。本研究は、超音波画像(USI)と構造化臨床、人口統計学、実験室、タンパク質バイオマーカーデータを統合することで、多モード頸動脈プラーク評価のための最先端および最近の大規模視覚言語モデル(LVLM)の可能性について検討する。インタビュースタイルの質問列を通じて現実的な診断シナリオをシミュレートするフレームワークを提案し, 汎用モデルと医用モデルの両方を含む, オープンソースのLVLMを比較検討した。ゼロショット実験では、たとえ非常に強力であっても、全てのLVLMが画像のモダリティと解剖を正確に識別できるわけではない。この制限に対処するため、LLaVa-NeXT-Vicunaはローランク適応(LoRA)を用いて超音波領域に適応し、脳卒中リスク層化を著しく改善した。テキスト形式でのマルチモーダル表データの統合により、特異性とバランスの取れた精度がさらに向上し、同じデータセットでトレーニングされた事前畳み込みニューラルネットワーク(CNN)ベースラインと比較して、競争性能が向上する。本研究は, 超音波による心血管リスク予測におけるLVLMの有望性と限界を両立させ, マルチモーダル統合, モデル校正, 臨床翻訳におけるドメイン適応の重要性を浮き彫りにした。

論文の概要: Multimodal Carotid Risk Stratification with Large Vision-Language Models: Benchmarking, Fine-Tuning, and Clinical Insights

関連論文リスト