Fugu-MT 論文翻訳(概要): Medical Referring Image Segmentation via Next-Token Mask Prediction

論文の概要: Medical Referring Image Segmentation via Next-Token Mask Prediction

arxiv url: http://arxiv.org/abs/2511.05044v1
Date: Fri, 07 Nov 2025 07:29:19 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-10 21:00:44.704141
Title: Medical Referring Image Segmentation via Next-Token Mask Prediction
Title（参考訳）: 次世代マスク予測による医用参照画像分割
Authors: Xinyu Chen, Yiran Wang, Gaoyang Pang, Jiafu Hao, Chentao Yue, Luping Zhou, Yonghui Li,
Abstract要約: 医療参照画像(Medical Referring Image: MRIS)は、自然言語の記述に基づいて、医療画像のターゲット領域を分割する。 NTP-MRISegは,トークン化画像,テキスト,マスク表現を統一したマルチモーダルシーケンス上で,MRISを自己回帰的次トーケン予測タスクとして再構成する新しいフレームワークである。
参考スコア（独自算出の注目度）: 40.827152909794336
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Medical Referring Image Segmentation (MRIS) involves segmenting target regions in medical images based on natural language descriptions. While achieving promising results, recent approaches usually involve complex design of multimodal fusion or multi-stage decoders. In this work, we propose NTP-MRISeg, a novel framework that reformulates MRIS as an autoregressive next-token prediction task over a unified multimodal sequence of tokenized image, text, and mask representations. This formulation streamlines model design by eliminating the need for modality-specific fusion and external segmentation models, supports a unified architecture for end-to-end training. It also enables the use of pretrained tokenizers from emerging large-scale multimodal models, enhancing generalization and adaptability. More importantly, to address challenges under this formulation-such as exposure bias, long-tail token distributions, and fine-grained lesion edges-we propose three novel strategies: (1) a Next-k Token Prediction (NkTP) scheme to reduce cumulative prediction errors, (2) Token-level Contrastive Learning (TCL) to enhance boundary sensitivity and mitigate long-tail distribution effects, and (3) a memory-based Hard Error Token (HET) optimization strategy that emphasizes difficult tokens during training. Extensive experiments on the QaTa-COV19 and MosMedData+ datasets demonstrate that NTP-MRISeg achieves new state-of-the-art performance, offering a streamlined and effective alternative to traditional MRIS pipelines.
Abstract（参考訳）: 医療参照イメージセグメンテーション(Medical Referring Image Segmentation, MRIS)は、自然言語記述に基づく医療画像のターゲット領域のセグメンテーションを含む。有望な結果を達成する一方で、最近のアプローチは通常、マルチモーダル融合またはマルチステージデコーダの複雑な設計を含む。本研究では,トークン化画像,テキスト,マスク表現を統一したマルチモーダルシーケンス上で,MRISを自己回帰的次トーケン予測タスクとして再構成する新しいフレームワークであるNTP-MRISegを提案する。この定式化は、モダリティ固有の融合と外部セグメンテーションモデルの必要性を排除し、モデル設計を効率化し、エンドツーエンドトレーニングのための統一アーキテクチャをサポートする。また、新しい大規模マルチモーダルモデルから事前訓練されたトークンライザの使用を可能にし、一般化と適応性を向上させる。さらに,1)累積予測誤差を低減するNext-k Token Prediction(NkTP)スキーム,(2)境界感度を高め,長期分布効果を軽減するTCL(Token-level Contrastive Learning),(3)訓練中の困難なトークンを強調するメモリベースのHard Error Token(HET)最適化戦略を提案する。 QaTa-COV19とMosMedData+データセットに関する大規模な実験は、NTP-MRISegが新しい最先端のパフォーマンスを実現し、従来のMRISパイプラインの合理化と効果的な代替手段を提供することを示した。

論文の概要: Medical Referring Image Segmentation via Next-Token Mask Prediction

関連論文リスト