Fugu-MT 論文翻訳(概要): ENSAM: an efficient foundation model for interactive segmentation of 3D medical images

論文の概要: ENSAM: an efficient foundation model for interactive segmentation of 3D medical images

arxiv url: http://arxiv.org/abs/2509.15874v1
Date: Fri, 19 Sep 2025 11:20:22 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-22 18:18:11.139495
Title: ENSAM: an efficient foundation model for interactive segmentation of 3D medical images
Title（参考訳）: 3次元医用画像の対話的セグメンテーションのための効率的な基礎モデルENSAM
Authors: Elias Stenhede, Agnar Martin Bjørnstad, Arian Ranjbar,
Abstract要約: ENSAMは、普遍的な3次元医用画像セグメンテーションのためのプロンプト可能なモデルである。 ENSAMは、限られたデータと計算予算の下で優れた性能を達成するように設計されている。 ENSAMは, マルチモーダル3次元医用画像を用いた隠れテストセットで評価した。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present ENSAM (Equivariant, Normalized, Segment Anything Model), a lightweight and promptable model for universal 3D medical image segmentation. ENSAM combines a SegResNet-based encoder with a prompt encoder and mask decoder in a U-Net-style architecture, using latent cross-attention, relative positional encoding, normalized attention, and the Muon optimizer for training. ENSAM is designed to achieve good performance under limited data and computational budgets, and is trained from scratch on under 5,000 volumes from multiple modalities (CT, MRI, PET, ultrasound, microscopy) on a single 32 GB GPU in 6 hours. As part of the CVPR 2025 Foundation Models for Interactive 3D Biomedical Image Segmentation Challenge, ENSAM was evaluated on hidden test set with multimodal 3D medical images, obtaining a DSC AUC of 2.404, NSD AUC of 2.266, final DSC of 0.627, and final NSD of 0.597, outperforming two previously published baseline models (VISTA3D, SAM-Med3D) and matching the third (SegVol), surpassing its performance in final DSC but trailing behind in the other three metrics. In the coreset track of the challenge, ENSAM ranks 5th of 10 overall and best among the approaches not utilizing pretrained weights. Ablation studies confirm that our use of relative positional encodings and the Muon optimizer each substantially speed up convergence and improve segmentation quality.
Abstract（参考訳）: 広義の3次元医用画像分割のための軽量かつ迅速なモデルであるENSAM(Equivariant, Normalized, Segment Anything Model)を提案する。 ENSAMは、SegResNetベースのエンコーダとプロンプトエンコーダとマスクデコーダをU-Netスタイルのアーキテクチャで組み合わせ、潜伏したクロスアテンション、相対的な位置エンコーディング、正規化された注意、トレーニングのためのミューオンオプティマイザを使用する。 ENSAMは、限られたデータと計算予算の下で優れたパフォーマンスを達成するように設計されており、6時間で1つの32GB GPU上で、CT、MRI、PET、超音波、顕微鏡などの複数のモードから、5000巻未満のボリュームでスクラッチからトレーニングされている。 CVPR 2025 Foundation Models for Interactive 3D Biomedical Image Segmentation Challengeの一環として、ENSAMはマルチモーダルな3D医療画像を用いた隠れテストセットで評価され、DSC AUCは2.404、NSD AUCは2.266、最終DSCは0.627、最終SDは0.597となり、これまでに発表された2つのベースラインモデル(VISTA3D、SAM-Med3D)と3番目のマッチングモデル(SegVol)を上回り、最終的なDSCのパフォーマンスを上回り、他の3つの指標に後れを取っていた。挑戦のコアセットトラックでは、ENSAMは総合10の5位にランクされ、事前訓練された重量を使用しないアプローチの中では最高である。アブレーション研究により、相対的な位置エンコーディングとミューオンオプティマイザがそれぞれ収束を著しく高速化し、セグメンテーション品質を向上させることが確認された。

論文の概要: ENSAM: an efficient foundation model for interactive segmentation of 3D medical images

関連論文リスト