Fugu-MT 論文翻訳(概要): MedSyn2: Flexible Control of 3D CT Generation via Text and Semantically-Defined Segmentation Prompts

論文の概要: MedSyn2: Flexible Control of 3D CT Generation via Text and Semantically-Defined Segmentation Prompts

arxiv url: http://arxiv.org/abs/2606.00967v2
Date: Wed, 03 Jun 2026 17:10:03 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-04 17:40:41.585784
Title: MedSyn2: Flexible Control of 3D CT Generation via Text and Semantically-Defined Segmentation Prompts
Title（参考訳）: MedSyn2:テキストとSemantically-Defined Segmentation Promptによる3次元CT生成の柔軟な制御
Authors: Weicheng Dai, Chenyu Wang, Andy Li, Shantanu Ghosh, Afrooz Zandifar, Christina LeBedis, Kayhan Batmanghelich,
Abstract要約: 本稿では,放射線学レポートやセグメンテーションプロンプトからの入力をサポートする可制御性ボリューム画像生成のためのフレキシブルなフレームワークを提案する。我々のアプローチでは、ユーザーは完全なアノテーションを必要とせずに、特定の解剖学または異常のセグメンテーションを提供できる。
参考スコア（独自算出の注目度）: 10.292505344385413
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generative models for volumetric medical images have found many applications in medical imaging, ranging from data augmentation to serving as priors for inverse problems. For these applications, generating high-resolution 3D images with strong controllability is essential but remains highly challenging. Existing approaches typically control generation either through radiology reports used as text prompts or through full image segmentation. While text-based prompting is flexible, it provides limited spatial control over the location, shape, and boundary of abnormalities. In contrast, segmentation-based methods receive precise spatial guidance but are restrictive in requiring full-organ annotations. In this work, we propose a flexible multimodal framework for controllable volumetric image generation that supports input from radiology reports and segmentation prompts (both optional). Our approach allows users to provide segmentation of a specific anatomy or abnormality without requiring full-organ annotations. The semantic meaning of the segmentation mask is specified through an accompanying text description, resulting in a highly flexible and scalable conditioning mechanism. We develop a memory-efficient architecture based on a modified diffusion transformer that jointly processes image and segmentation tokens. The model further incorporates gated attention to effectively attend to long radiology reports. Experiments demonstrate that our method achieves state-of-the-art perceptual and semantic scores (e.g., 24% relative improvement in mean FID), generates high-resolution anatomically consistent CT volumes, and improves data efficiency when used for data augmentation. Radiologists' evaluation further confirms strong alignment between generated and real medical images.
Abstract（参考訳）: ボリューム医用画像の生成モデルは、データ増強から逆問題の前兆となるものまで、医療画像に多くの応用を見出している。これらのアプリケーションでは、強い制御性を持つ高解像度の3D画像を生成することが不可欠であるが、依然として非常に難しい。既存のアプローチは、通常、テキストプロンプトとして使われる放射線学レポートまたはフルイメージセグメンテーションを通じて生成を制御する。テキストベースのプロンプトは柔軟だが、異常の位置、形状、境界を空間的に限定的に制御できる。対照的に、セグメンテーションに基づく手法は正確な空間的ガイダンスを受け取っているが、完全なアノテーションを必要とする場合は制限的である。本研究では,ラジオグラフィレポートやセグメンテーションプロンプトからの入力をサポートする可変ボリューム画像生成のためのフレキシブル・マルチモーダル・フレームワークを提案する。我々のアプローチでは、ユーザーは完全なアノテーションを必要とせずに、特定の解剖学または異常のセグメンテーションを提供できる。セグメンテーションマスクの意味は、付随するテキスト記述を通じて特定され、非常に柔軟でスケーラブルな条件付け機構となる。画像とセグメンテーショントークンを共同処理する拡散変換器を改良したメモリ効率のアーキテクチャを開発した。このモデルは、長期の放射線学レポートに効果的に出席するために、さらに注意を喚起する。実験により,本手法は最先端の知覚的・意味的スコア(例えば,FIDの平均24%の相対的改善)を達成し,高分解能な解剖学的CTボリュームを生成し,データ拡張に使用する際のデータ効率を向上することを示した。放射線医の評価は、生成された医用画像と実際の医用画像との強い整合性をさらに確認する。

論文の概要: MedSyn2: Flexible Control of 3D CT Generation via Text and Semantically-Defined Segmentation Prompts

関連論文リスト