Beyond Pixel Simulation: Pathology Image Generation via Diagnostic Semantic Tokens and Prototype Control
- URL: http://arxiv.org/abs/2512.21058v1
- Date: Wed, 24 Dec 2025 08:52:08 GMT
- Title: Beyond Pixel Simulation: Pathology Image Generation via Diagnostic Semantic Tokens and Prototype Control
- Authors: Minghao Han, YiChen Liu, Yizhou Liu, Zizhi Chen, Jingqun Tang, Xuecheng Wu, Dingkang Yang, Lihua Zhang,
- Abstract summary: We introduce UniPath, a semantics-driven pathology image generation framework.<n>UniPath implements Multi-Stream Control: a Raw-Text stream; a High-Level Semantics stream that uses learnable queries to a frozen pathology MLLM.<n>On the data front, we curate a 2.65M image-text corpus and a finely annotated, high-quality 68K subset to alleviate data scarcity.<n>Experiments demonstrate UniPath's SOTA performance, including a Patho-FID of 80.9 (51% better than the second-best) and fine-grained semantic control achieving 98.7% of the real-
- Score: 45.749134892112714
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In computational pathology, understanding and generation have evolved along disparate paths: advanced understanding models already exhibit diagnostic-level competence, whereas generative models largely simulate pixels. Progress remains hindered by three coupled factors: the scarcity of large, high-quality image-text corpora; the lack of precise, fine-grained semantic control, which forces reliance on non-semantic cues; and terminological heterogeneity, where diverse phrasings for the same diagnostic concept impede reliable text conditioning. We introduce UniPath, a semantics-driven pathology image generation framework that leverages mature diagnostic understanding to enable controllable generation. UniPath implements Multi-Stream Control: a Raw-Text stream; a High-Level Semantics stream that uses learnable queries to a frozen pathology MLLM to distill paraphrase-robust Diagnostic Semantic Tokens and to expand prompts into diagnosis-aware attribute bundles; and a Prototype stream that affords component-level morphological control via a prototype bank. On the data front, we curate a 2.65M image-text corpus and a finely annotated, high-quality 68K subset to alleviate data scarcity. For a comprehensive assessment, we establish a four-tier evaluation hierarchy tailored to pathology. Extensive experiments demonstrate UniPath's SOTA performance, including a Patho-FID of 80.9 (51% better than the second-best) and fine-grained semantic control achieving 98.7% of the real-image. The meticulously curated datasets, complete source code, and pre-trained model weights developed in this study will be made openly accessible to the public.
Related papers
- A Semantically Enhanced Generative Foundation Model Improves Pathological Image Synthesis [82.01597026329158]
We introduce a Correlation-Regulated Alignment Framework for Tissue Synthesis (CRAFTS) for pathology-specific text-to-image synthesis.<n>CRAFTS incorporates a novel alignment mechanism that suppresses semantic drift to ensure biological accuracy.<n>This model generates diverse pathological images spanning 30 cancer types, with quality rigorously validated by objective metrics and pathologist evaluations.
arXiv Detail & Related papers (2025-12-15T10:22:43Z) - Multi-Aspect Knowledge-Enhanced Medical Vision-Language Pretraining with Multi-Agent Data Generation [13.362188283113788]
Vision-language pretraining has emerged as a powerful paradigm in medical image analysis.<n>We propose a novel framework integrating a Multi-Agent data GENeration (MAGEN) system and Ontology-based Multi-Aspect Knowledge-Enhanced (O-MAKE) pretraining.
arXiv Detail & Related papers (2025-12-03T04:55:54Z) - MedROV: Towards Real-Time Open-Vocabulary Detection Across Diverse Medical Imaging Modalities [89.81463562506637]
We introduce MedROV, the first Real-time Open Vocabulary detection model for medical imaging.<n>By leveraging contrastive learning and cross-modal representations, MedROV effectively detects both known and novel structures.
arXiv Detail & Related papers (2025-11-25T18:59:53Z) - Graph Conditioned Diffusion for Controllable Histopathology Image Generation [26.102552837222103]
We propose graph-based object-level representations for Graph-Conditioned-Diffusion.<n>Our approach generates graph nodes corresponding to each major structure in the image, encapsulating their individual features and relationships.<n>We evaluate this approach using a real-world histopathology use case, demonstrating that our generated data can reliably substitute for annotated patient data in downstream segmentation tasks.
arXiv Detail & Related papers (2025-10-08T15:26:08Z) - PathDiff: Histopathology Image Synthesis with Unpaired Text and Mask Conditions [38.32128533564591]
Public datasets lack paired text and mask data for the same histopathology images.<n>We propose PathDiff, a diffusion framework that effectively learns from unpaired mask-text data.<n> PathDiff allows precise control over structural and contextual features, generating high-quality, semantically accurate images.
arXiv Detail & Related papers (2025-06-30T00:31:03Z) - PathSegDiff: Pathology Segmentation using Diffusion model representations [63.20694440934692]
We propose PathSegDiff, a novel approach for histopathology image segmentation that leverages Latent Diffusion Models (LDMs) as pre-trained featured extractors.<n>Our method utilizes a pathology-specific LDM, guided by a self-supervised encoder, to extract rich semantic information from H&E stained histopathology images.<n>Our experiments demonstrate significant improvements over traditional methods on the BCSS and GlaS datasets.
arXiv Detail & Related papers (2025-04-09T14:58:21Z) - RL4Med-DDPO: Reinforcement Learning for Controlled Guidance Towards Diverse Medical Image Generation using Vision-Language Foundation Models [0.7165255458140439]
Vision-Language Foundation Models (VLFM) have shown a tremendous increase in performance in terms of generating high-resolution, photorealistic natural images.<n>We propose a multi-stage architecture where a pre-trained VLFM provides a cursory semantic understanding, while a reinforcement learning algorithm refines the alignment through an iterative process.<n>The reward signal is designed to align the semantic information of the text with synthesized images.
arXiv Detail & Related papers (2025-03-20T01:51:05Z) - Diverse Image Generation with Diffusion Models and Cross Class Label Learning for Polyp Classification [4.747649393635696]
We develop a novel model, PathoPolyp-Diff, that generates text-controlled synthetic images with diverse characteristics.<n>We introduce cross-class label learning to make the model learn features from other classes, reducing the burdensome task of data annotation.
arXiv Detail & Related papers (2025-02-08T04:26:20Z) - Hierarchical Text-to-Vision Self Supervised Alignment for Improved Histopathology Representation Learning [64.1316997189396]
We present a novel language-tied self-supervised learning framework, Hierarchical Language-tied Self-Supervision (HLSS) for histopathology images.
Our resulting model achieves state-of-the-art performance on two medical imaging benchmarks, OpenSRH and TCGA datasets.
arXiv Detail & Related papers (2024-03-21T17:58:56Z) - PathLDM: Text conditioned Latent Diffusion Model for Histopathology [62.970593674481414]
We introduce PathLDM, the first text-conditioned Latent Diffusion Model tailored for generating high-quality histopathology images.
Our approach fuses image and textual data to enhance the generation process.
We achieved a SoTA FID score of 7.64 for text-to-image generation on the TCGA-BRCA dataset, significantly outperforming the closest text-conditioned competitor with FID 30.1.
arXiv Detail & Related papers (2023-09-01T22:08:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.