Scientific Image Synthesis: Benchmarking, Methodologies, and Downstream Utility
- URL: http://arxiv.org/abs/2601.17027v1
- Date: Sat, 17 Jan 2026 14:18:36 GMT
- Title: Scientific Image Synthesis: Benchmarking, Methodologies, and Downstream Utility
- Authors: Honglin Lin, Chonghan Qin, Zheng Liu, Qizhi Pei, Yu Li, Zhanping Zhong, Xin Gao, Yanfeng Wang, Conghui He, Lijun Wu,
- Abstract summary: We study scientific image synthesis across generation paradigms, evaluation, and downstream use.<n>We introduce SciGenBench, which evaluates generated images based on information utility and logical validity.<n>We show that fine-tuning Large Multimodal Models on rigorously verified synthetic scientific images yields consistent reasoning gains.
- Score: 57.83550091882176
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While synthetic data has proven effective for improving scientific reasoning in the text domain, multimodal reasoning remains constrained by the difficulty of synthesizing scientifically rigorous images. Existing Text-to-Image (T2I) models often produce outputs that are visually plausible yet scientifically incorrect, resulting in a persistent visual-logic divergence that limits their value for downstream reasoning. Motivated by recent advances in next-generation T2I models, we conduct a systematic study of scientific image synthesis across generation paradigms, evaluation, and downstream use. We analyze both direct pixel-based generation and programmatic synthesis, and propose ImgCoder, a logic-driven framework that follows an explicit "understand - plan - code" workflow to improve structural precision. To rigorously assess scientific correctness, we introduce SciGenBench, which evaluates generated images based on information utility and logical validity. Our evaluation reveals systematic failure modes in pixel-based models and highlights a fundamental expressiveness-precision trade-off. Finally, we show that fine-tuning Large Multimodal Models (LMMs) on rigorously verified synthetic scientific images yields consistent reasoning gains, with potential scaling trends analogous to the text domain, validating high-fidelity scientific synthesis as a viable path to unlocking massive multimodal reasoning capabilities.
Related papers
- SynMind: Reducing Semantic Hallucination in fMRI-Based Image Reconstruction [52.34513874272676]
We argue that existing methods rely too heavily on entangled visual embeddings over explicit semantic identity.<n>We parse fMRI signals into rich, sentence-level semantic descriptions that mirror the hierarchical and compositional nature of human visual understanding.<n>We propose SynMind, a framework that integrates these explicit semantic encodings with visual priors to condition a pretrained diffusion model.
arXiv Detail & Related papers (2026-01-25T14:31:23Z) - SynBrain: Enhancing Visual-to-fMRI Synthesis via Probabilistic Representation Learning [54.390403684665834]
Deciphering how visual stimuli are transformed into cortical responses is a fundamental challenge in computational neuroscience.<n>We propose SynBrain, a generative framework that simulates the transformation from visual semantics to neural responses in a probabilistic and biologically interpretable manner.<n> Experimental results demonstrate that SynBrain surpasses state-of-the-art methods in subject-specific visual-to-fMRI encoding performance.
arXiv Detail & Related papers (2025-08-14T03:01:05Z) - SridBench: Benchmark of Scientific Research Illustration Drawing of Image Generation Model [21.81341169834812]
SridBench is the first benchmark for scientific figure generation.<n>It comprises 1,120 instances from leading scientific papers across 13 natural and computer science disciplines.<n>Results reveal that even top-tier models like GPT-4o-image lag behind human performance.
arXiv Detail & Related papers (2025-05-28T08:51:01Z) - Bi-modality medical images synthesis by a bi-directional discrete process matching method [2.7309692684728617]
We propose a novel flow-based model, namely bi-directional Discrete Process Matching (Bi-DPM) to accomplish the bi-modality image synthesis tasks.<n>Bi-DPM outperforms other state-of-the-art flow-based methods for bi-modality image synthesis, delivering higher image quality with accurate anatomical regions.
arXiv Detail & Related papers (2024-09-06T01:54:35Z) - Enhancing Medical Imaging with GANs Synthesizing Realistic Images from Limited Data [3.7304751266416747]
We introduce an innovative method for synthesizing medical images using generative adversarial networks (GANs)
Our proposed GANs method demonstrates the capability to produce realistic synthetic images even when trained on a limited quantity of real medical image data.
arXiv Detail & Related papers (2024-05-22T23:32:24Z) - Is Synthetic Image Useful for Transfer Learning? An Investigation into Data Generation, Volume, and Utilization [62.157627519792946]
We introduce a novel framework called bridged transfer, which initially employs synthetic images for fine-tuning a pre-trained model to improve its transferability.
We propose dataset style inversion strategy to improve the stylistic alignment between synthetic and real images.
Our proposed methods are evaluated across 10 different datasets and 5 distinct models, demonstrating consistent improvements.
arXiv Detail & Related papers (2024-03-28T22:25:05Z) - Scaling Rectified Flow Transformers for High-Resolution Image Synthesis [22.11487736315616]
Rectified flow is a recent generative model formulation that connects data and noise in a straight line.
We improve existing noise sampling techniques for training rectified flow models by biasing them towards perceptually relevant scales.
We present a novel transformer-based architecture for text-to-image generation that uses separate weights for the two modalities.
arXiv Detail & Related papers (2024-03-05T18:45:39Z) - GM-NeRF: Learning Generalizable Model-based Neural Radiance Fields from
Multi-view Images [79.39247661907397]
We introduce an effective framework Generalizable Model-based Neural Radiance Fields to synthesize free-viewpoint images.
Specifically, we propose a geometry-guided attention mechanism to register the appearance code from multi-view 2D images to a geometry proxy.
arXiv Detail & Related papers (2023-03-24T03:32:02Z) - Identity-Aware CycleGAN for Face Photo-Sketch Synthesis and Recognition [61.87842307164351]
We first propose an Identity-Aware CycleGAN (IACycleGAN) model that applies a new perceptual loss to supervise the image generation network.
It improves CycleGAN on photo-sketch synthesis by paying more attention to the synthesis of key facial regions, such as eyes and nose.
We develop a mutual optimization procedure between the synthesis model and the recognition model, which iteratively synthesizes better images by IACycleGAN.
arXiv Detail & Related papers (2021-03-30T01:30:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.