Related papers: Rescind: Countering Image Misconduct in Biomedical Publications with Vision-Language and State-Space Modeling

Rescind: Countering Image Misconduct in Biomedical Publications with Vision-Language and State-Space Modeling

URL: http://arxiv.org/abs/2601.08040v1
Date: Mon, 12 Jan 2026 22:13:58 GMT
Title: Rescind: Countering Image Misconduct in Biomedical Publications with Vision-Language and State-Space Modeling
Authors: Soumyaroop Nandi, Prem Natarajan,
Abstract summary: We present the first vision-language guided framework for both generating and detecting biomedical image forgeries.<n>By combining diffusion-based synthesis with vision-language prompting, our method enables realistic and semantically controlled manipulations.<n>Integscan achieves state of the art performance in both detection and localization, establishing a strong foundation for automated scientific integrity analysis.
Score: 8.024142807011378
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Scientific image manipulation in biomedical publications poses a growing threat to research integrity and reproducibility. Unlike natural image forensics, biomedical forgery detection is uniquely challenging due to domain-specific artifacts, complex textures, and unstructured figure layouts. We present the first vision-language guided framework for both generating and detecting biomedical image forgeries. By combining diffusion-based synthesis with vision-language prompting, our method enables realistic and semantically controlled manipulations, including duplication, splicing, and region removal, across diverse biomedical modalities. We introduce Rescind, a large-scale benchmark featuring fine-grained annotations and modality-specific splits, and propose Integscan, a structured state space modeling framework that integrates attention-enhanced visual encoding with prompt-conditioned semantic alignment for precise forgery localization. To ensure semantic fidelity, we incorporate a vision-language model based verification loop that filters generated forgeries based on consistency with intended prompts. Extensive experiments on Rescind and existing benchmarks demonstrate that Integscan achieves state of the art performance in both detection and localization, establishing a strong foundation for automated scientific integrity analysis.

Related papers

SynMind: Reducing Semantic Hallucination in fMRI-Based Image Reconstruction [52.34513874272676]
We argue that existing methods rely too heavily on entangled visual embeddings over explicit semantic identity.<n>We parse fMRI signals into rich, sentence-level semantic descriptions that mirror the hierarchical and compositional nature of human visual understanding.<n>We propose SynMind, a framework that integrates these explicit semantic encodings with visual priors to condition a pretrained diffusion model.
arXiv Detail & Related papers (2026-01-25T14:31:23Z)
Plasticine: A Traceable Diffusion Model for Medical Image Translation [79.39689106440389]
We propose Plasticine, to the best of our knowledge, the first end-to-end image-to-image translation framework explicitly designed with traceability as a core objective.<n>Our method combines intensity translation and spatial transformation within a denoising diffusion framework.<n>This design enables the generation of synthetic images with interpretable intensity transitions and spatially coherent deformations, supporting pixel-wise traceability throughout the translation process.
arXiv Detail & Related papers (2025-12-20T18:01:57Z)
A Semantically Enhanced Generative Foundation Model Improves Pathological Image Synthesis [82.01597026329158]
We introduce a Correlation-Regulated Alignment Framework for Tissue Synthesis (CRAFTS) for pathology-specific text-to-image synthesis.<n>CRAFTS incorporates a novel alignment mechanism that suppresses semantic drift to ensure biological accuracy.<n>This model generates diverse pathological images spanning 30 cancer types, with quality rigorously validated by objective metrics and pathologist evaluations.
arXiv Detail & Related papers (2025-12-15T10:22:43Z)
Self-Supervised Anatomical Consistency Learning for Vision-Grounded Medical Report Generation [61.350584471060756]
Vision-grounded medical report generation aims to produce clinically accurate descriptions of medical images.<n>We propose Self-Supervised Anatomical Consistency Learning (SS-ACL) to align generated reports with corresponding anatomical regions.<n>SS-ACL constructs a hierarchical anatomical graph inspired by the invariant top-down inclusion structure of human anatomy.
arXiv Detail & Related papers (2025-09-30T08:59:06Z)
RL4Med-DDPO: Reinforcement Learning for Controlled Guidance Towards Diverse Medical Image Generation using Vision-Language Foundation Models [0.7165255458140439]
Vision-Language Foundation Models (VLFM) have shown a tremendous increase in performance in terms of generating high-resolution, photorealistic natural images.<n>We propose a multi-stage architecture where a pre-trained VLFM provides a cursory semantic understanding, while a reinforcement learning algorithm refines the alignment through an iterative process.<n>The reward signal is designed to align the semantic information of the text with synthesized images.
arXiv Detail & Related papers (2025-03-20T01:51:05Z)
VICCA: Visual Interpretation and Comprehension of Chest X-ray Anomalies in Generated Report Without Human Feedback [1.5839621757142595]
We propose a novel framework designed to enhance the semantic alignment and localization accuracy of AI-generated medical reports.<n>By comparing features between the original and generated images, we introduce a dual-scoring system.<n>This approach significantly outperforms existing methods, achieving state-of-the-art results in pathology localization and text-to-image alignment.
arXiv Detail & Related papers (2025-01-29T16:02:16Z)
A Multimodal Approach Combining Structural and Cross-domain Textual Guidance for Weakly Supervised OCT Segmentation [12.948027961485536]
We propose a novel Weakly Supervised Semantic (WSSS) approach that integrates structural guidance with text-driven strategies to generate high-quality pseudo labels. Our method achieves state-of-the-art performance, highlighting its potential to improve diagnostic accuracy and efficiency in medical imaging.
arXiv Detail & Related papers (2024-11-19T16:20:27Z)
Anatomical Structure-Guided Medical Vision-Language Pre-training [21.68719061251635]
We propose an Anatomical Structure-Guided (ASG) framework for learning medical visual representations. For anatomical region, we design an automatic anatomical region-sentence alignment paradigm in collaboration with radiologists. For finding and existence, we regard them as image tags, applying an image-tag recognition decoder to associate image features with their respective tags within each sample.
arXiv Detail & Related papers (2024-03-14T11:29:47Z)
VALD-MD: Visual Attribution via Latent Diffusion for Medical Diagnostics [0.0]
Visual attribution in medical imaging seeks to make evident the diagnostically-relevant components of a medical image. We here present a novel generative visual attribution technique, one that leverages latent diffusion models in combination with domain-specific large language models. The resulting system also exhibits a range of latent capabilities including zero-shot localized disease induction.
arXiv Detail & Related papers (2024-01-02T19:51:49Z)
ScoreNet: Learning Non-Uniform Attention and Augmentation for Transformer-Based Histopathological Image Classification [11.680355561258427]
High-resolution images hinder progress in digital pathology. patch-based processing often incorporates multiple instance learning (MIL) to aggregate local patch-level representations yielding image-level prediction. This paper proposes a transformer-based architecture specifically tailored for histological image classification. It combines fine-grained local attention with a coarse global attention mechanism to learn meaningful representations of high-resolution images at an efficient computational cost.
arXiv Detail & Related papers (2022-02-15T16:55:09Z)
Few-shot Medical Image Segmentation using a Global Correlation Network with Discriminative Embedding [60.89561661441736]
We propose a novel method for few-shot medical image segmentation. We construct our few-shot image segmentor using a deep convolutional network trained episodically. We enhance discriminability of deep embedding to encourage clustering of the feature domains of the same class.
arXiv Detail & Related papers (2020-12-10T04:01:07Z)
Pathological Retinal Region Segmentation From OCT Images Using Geometric Relation Based Augmentation [84.7571086566595]
We propose improvements over previous GAN-based medical image synthesis methods by jointly encoding the intrinsic relationship of geometry and shape. The proposed method outperforms state-of-the-art segmentation methods on the public RETOUCH dataset having images captured from different acquisition procedures.
arXiv Detail & Related papers (2020-03-31T11:50:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.