Fugu-MT 論文翻訳(概要): Layout-Guided Controllable Pathology Image Generation with In-Context Diffusion Transformers

論文の概要: Layout-Guided Controllable Pathology Image Generation with In-Context Diffusion Transformers

arxiv url: http://arxiv.org/abs/2603.13386v1
Date: Wed, 11 Mar 2026 06:14:11 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-17 16:19:35.15048
Title: Layout-Guided Controllable Pathology Image Generation with In-Context Diffusion Transformers
Title（参考訳）: In-Context Diffusion Transformer を用いたレイアウト型可制御型画像生成
Authors: Yuntao Shou, Xiangyong Cao, Qian Zhao, Deyu Meng,
Abstract要約: 制御可能な病理画像合成には、空間配置、組織形態、意味的詳細の信頼できる規制が必要である。 In-Context Diffusion Transformer (IC-DiT) は,空間レイアウト,テキスト記述,視覚的埋め込みを統合拡散変換器に組み込んだレイアウト認識生成モデルである。 IC-DiTは既存の方法よりも忠実度が高く、空間制御性が強く、診断の整合性が良くなる。
参考スコア（独自算出の注目度）: 57.54843029965778
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Controllable pathology image synthesis requires reliable regulation of spatial layout, tissue morphology, and semantic detail. However, existing text-guided diffusion models offer only coarse global control and lack the ability to enforce fine-grained structural constraints. Progress is further limited by the absence of large datasets that pair patch-level spatial layouts with detailed diagnostic descriptions, since generating such annotations for gigapixel whole-slide images is prohibitively time-consuming for human experts. To overcome these challenges, we first develop a scalable multi-agent LVLM annotation framework that integrates image description, diagnostic step extraction, and automatic quality judgment into a coordinated pipeline, and we evaluate the reliability of the system through a human verification process. This framework enables efficient construction of fine-grained and clinically aligned supervision at scale. Building on the curated data, we propose In-Context Diffusion Transformer (IC-DiT), a layout-aware generative model that incorporates spatial layouts, textual descriptions, and visual embeddings into a unified diffusion transformer. Through hierarchical multimodal attention, IC-DiT maintains global semantic coherence while accurately preserving structural and morphological details. Extensive experiments on five histopathology datasets show that IC-DiT achieves higher fidelity, stronger spatial controllability, and better diagnostic consistency than existing methods. In addition, the generated images serve as effective data augmentation resources for downstream tasks such as cancer classification and survival analysis.
Abstract（参考訳）: 制御可能な病理画像合成には、空間配置、組織形態、意味的詳細の信頼できる規制が必要である。しかし、既存のテキスト誘導拡散モデルは、粗いグローバルコントロールのみを提供し、きめ細かい構造的制約を強制する能力がない。プログレッシブは、パッチレベルの空間レイアウトを詳細な診断記述と組み合わせる大きなデータセットがないことにより、さらに制限されている。これらの課題を克服するために、まず画像記述、診断ステップ抽出、自動品質判断を協調パイプラインに統合するスケーラブルなマルチエージェントLVLMアノテーションフレームワークを開発し、人間の検証プロセスを通じてシステムの信頼性を評価する。この枠組みは、微細で臨床的に整合した監督を大規模に効率的に構築することを可能にする。 In-Context Diffusion Transformer (IC-DiT) は,空間レイアウト,テキスト記述,視覚的埋め込みを統合拡散変換器に組み込んだレイアウト認識型生成モデルである。階層的マルチモーダル・アテンションを通じて、IC-DiTは構造的および形態的詳細を正確に保存しつつ、グローバルな意味的コヒーレンスを維持している。 5つの病理組織学的データセットの大規模な実験により、IC-DiTは既存の方法よりも高い忠実度、より強い空間制御性、診断の整合性を達成できることが示された。さらに、生成された画像は、がん分類や生存分析などの下流タスクに有効なデータ拡張リソースとして機能する。

論文の概要: Layout-Guided Controllable Pathology Image Generation with In-Context Diffusion Transformers

関連論文リスト