Fugu-MT 論文翻訳(概要): Concept-to-Pixel: Prompt-Free Universal Medical Image Segmentation

論文の概要: Concept-to-Pixel: Prompt-Free Universal Medical Image Segmentation

arxiv url: http://arxiv.org/abs/2603.17746v1
Date: Wed, 18 Mar 2026 14:13:22 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-19 18:32:57.740034
Title: Concept-to-Pixel: Prompt-Free Universal Medical Image Segmentation
Title（参考訳）: Concept-to-Pixel: Prompt-free Universal Medical Image Segmentation
Authors: Haoyun Chen, Fenghe Tang, Wenxin Ma, Shaohua Kevin Zhou,
Abstract要約: Concept-to-Pixel (C2P) は、新しいプロンプトフリーのユニバーサルセグメンテーションフレームワークである。 C2Pは解剖学的知識を幾何学的表現とセマンティック表現の2つの構成要素に分ける。
参考スコア（独自算出の注目度）: 2.5026850988034797
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Universal medical image segmentation seeks to use a single foundational model to handle diverse tasks across multiple imaging modalities. However, existing approaches often rely heavily on manual visual prompts or retrieved reference images, which limits their automation and robustness. In addition, naive joint training across modalities often fails to address large domain shifts. To address these limitations, we propose Concept-to-Pixel (C2P), a novel prompt-free universal segmentation framework. C2P explicitly separates anatomical knowledge into two components: Geometric and Semantic representations. It leverages Multimodal Large Language Models (MLLMs) to distill abstract, high-level medical concepts into learnable Semantic Tokens and introduces explicitly supervised Geometric Tokens to enforce universal physical and structural constraints. These disentangled tokens interact deeply with image features to generate input-specific dynamic kernels for precise mask prediction. Furthermore, we introduce a Geometry-Aware Inference Consensus mechanism, which utilizes the model's predicted geometric constraints to assess prediction reliability and suppress outliers. Extensive experiments and analysis on a unified benchmark comprising eight diverse datasets across seven modalities demonstrate the significant superiority of our jointly trained approach, compared to universe- or single-model approaches. Remarkably, our unified model demonstrates strong generalization, achieving impressive results not only on zero-shot tasks involving unseen cases but also in cross-modal transfers across similar tasks. Code is available at: https://github.com/Yundi218/Concept-to-Pixel
Abstract（参考訳）: ユニバーサル・メディカル・イメージ・セグメンテーション (Universal Medical Image segmentation) は、複数の画像モダリティにわたる多様なタスクを処理するために、単一の基礎モデルを使用することを目指している。しかし、既存のアプローチは、しばしば手動の視覚的プロンプトや参照画像の取得に大きく依存しており、自動化と堅牢性を制限する。さらに、モダリティを越えたナイーブなジョイントトレーニングは、大きなドメインシフトに対応できないことが多い。これらの制約に対処するために,新しいプロンプトフリーユニバーサルセグメンテーションフレームワークであるConcept-to-Pixel (C2P)を提案する。 C2Pは解剖学的知識を、幾何学的表現とセマンティック表現の2つの構成要素に明確に分けている。 MLLM(Multimodal Large Language Models)を活用して、抽象的でハイレベルな医療概念を学習可能なセマンティックトークンに抽出し、普遍的な物理的および構造的制約を強制するために、明示的に監督された幾何学的トークンを導入する。これらの歪んだトークンは画像の特徴と深く相互作用し、正確なマスク予測のために入力固有の動的カーネルを生成する。さらに,モデルが予測した幾何制約を利用して予測信頼性を評価し,外れ値を抑制する幾何学的推論合意機構を導入する。 7つのモダリティにまたがる8つの多様なデータセットからなる統一されたベンチマークに関する大規模な実験と分析は、宇宙や単一モデルアプローチと比較して、共同で訓練されたアプローチの顕著な優位性を示している。注目すべきことに、我々の統一モデルは強力な一般化を示し、目に見えないケースを含むゼロショットタスクだけでなく、類似タスク間のクロスモーダル転送においても印象的な結果が得られる。コードは、https://github.com/Yundi218/Concept-to-Pixelで入手できる。

論文の概要: Concept-to-Pixel: Prompt-Free Universal Medical Image Segmentation

関連論文リスト