The Consistency Critic: Correcting Inconsistencies in Generated Images via Reference-Guided Attentive Alignment
- URL: http://arxiv.org/abs/2511.20614v1
- Date: Tue, 25 Nov 2025 18:40:25 GMT
- Title: The Consistency Critic: Correcting Inconsistencies in Generated Images via Reference-Guided Attentive Alignment
- Authors: Ziheng Ouyang, Yiren Song, Yaoli Liu, Shihao Zhu, Qibin Hou, Ming-Ming Cheng, Mike Zheng Shou,
- Abstract summary: ImageCritic can be integrated into an agent framework to automatically detect inconsistencies and correct them with multi-round and local editing.<n>In experiments, ImageCritic can effectively resolve detail-related issues in various customized generation scenarios, providing significant improvements over existing methods.
- Score: 105.31858867473845
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Previous works have explored various customized generation tasks given a reference image, but they still face limitations in generating consistent fine-grained details. In this paper, our aim is to solve the inconsistency problem of generated images by applying a reference-guided post-editing approach and present our ImageCritic. We first construct a dataset of reference-degraded-target triplets obtained via VLM-based selection and explicit degradation, which effectively simulates the common inaccuracies or inconsistencies observed in existing generation models. Furthermore, building on a thorough examination of the model's attention mechanisms and intrinsic representations, we accordingly devise an attention alignment loss and a detail encoder to precisely rectify inconsistencies. ImageCritic can be integrated into an agent framework to automatically detect inconsistencies and correct them with multi-round and local editing in complex scenarios. Extensive experiments demonstrate that ImageCritic can effectively resolve detail-related issues in various customized generation scenarios, providing significant improvements over existing methods.
Related papers
- Re-Align: Structured Reasoning-guided Alignment for In-Context Image Generation and Editing [38.240269144736224]
Re-Align bridges the gap between understanding and generation through structured reasoning-guided alignment.<n>In-context image generation and editing (ICGE) enables users to specify visual concepts through interleaved image-text prompts.
arXiv Detail & Related papers (2026-01-08T17:13:00Z) - OmniRefiner: Reinforcement-Guided Local Diffusion Refinement [10.329465965964571]
VAE-based latent compression discards subtle texture information, causing identity- and attribute-specific cues to vanish.<n>We introduce ourMthd, a detail-aware refinement framework that performs two consecutive stages of reference-driven correction.<n>Experiments demonstrate that ourMthd significantly improves reference alignment and fine-grained detail preservation.
arXiv Detail & Related papers (2025-11-25T06:57:49Z) - UniREditBench: A Unified Reasoning-based Image Editing Benchmark [52.54256348710893]
This work proposes UniREditBench, a unified benchmark for reasoning-based image editing evaluation.<n>It comprises 2,700 meticulously curated samples, covering both real- and game-world scenarios across 8 primary dimensions and 18 sub-dimensions.<n>We fine-tune Bagel on this dataset and develop UniREdit-Bagel, demonstrating substantial improvements in both in-domain and out-of-distribution settings.
arXiv Detail & Related papers (2025-11-03T07:24:57Z) - Leveraging Hierarchical Image-Text Misalignment for Universal Fake Image Detection [58.927873049646024]
We show that fake images cannot be properly aligned with corresponding captions compared to real images.<n>We propose a simple yet effective ITEM by leveraging the image-text misalignment in a joint visual-language space as discriminative clues.
arXiv Detail & Related papers (2025-11-01T06:51:14Z) - EditInfinity: Image Editing with Binary-Quantized Generative Models [64.05135380710749]
We investigate the parameter-efficient adaptation of binary-quantized generative models for image editing.<n>Specifically, we propose EditInfinity, which adapts emphInfinity, a binary-quantized generative model, for image editing.<n>We propose an efficient yet effective image inversion mechanism that integrates text prompting rectification and image style preservation.
arXiv Detail & Related papers (2025-10-23T05:06:24Z) - G4Seg: Generation for Inexact Segmentation Refinement with Diffusion Models [38.44872934965588]
This paper considers the problem of utilizing a large-scale text-to-image model to tackle the Inexact diffusion (IS) task.<n>We exploit the pattern discrepancies between original images and mask-conditional generated images to facilitate a coarse-to-fine segmentation refinement.
arXiv Detail & Related papers (2025-06-02T11:05:28Z) - From Controlled Scenarios to Real-World: Cross-Domain Degradation Pattern Matching for All-in-One Image Restoration [2.997052569698842]
All-in-One Image Restoration (AiOIR) aims to achieve image restoration caused by multiple degradation patterns via a single model with unified parameters.<n>UDAIR framework is proposed to effectively achieve AiOIR by leveraging the learned knowledge from source domain to target domain.<n> Experimental results on 10 open-source datasets demonstrate that UDAIR achieves new state-of-the-art performance for the AiOIR task.
arXiv Detail & Related papers (2025-05-28T12:22:00Z) - Instilling Multi-round Thinking to Text-guided Image Generation [72.2032630115201]
Single-round generation often overlooks crucial details, particularly in the realm of fine-grained changes like shoes or sleeves.
We introduce a new self-supervised regularization, ie, multi-round regularization, which is compatible with existing methods.
It builds upon the observation that the modification order generally should not affect the final result.
arXiv Detail & Related papers (2024-01-16T16:19:58Z) - DiG-IN: Diffusion Guidance for Investigating Networks -- Uncovering Classifier Differences Neuron Visualisations and Visual Counterfactual Explanations [35.458709912618176]
Deep learning has led to huge progress in complex image classification tasks like ImageNet, unexpected failure modes, e.g. via spurious features.
For safety-critical tasks the black-box nature of their decisions is problematic, and explanations or at least methods which make decisions plausible are needed urgently.
We address these problems by generating images that optimize a classifier-derived objective using a framework for guided image generation.
arXiv Detail & Related papers (2023-11-29T17:35:29Z) - Unsupervised Discovery of Disentangled Manifolds in GANs [74.24771216154105]
Interpretable generation process is beneficial to various image editing applications.
We propose a framework to discover interpretable directions in the latent space given arbitrary pre-trained generative adversarial networks.
arXiv Detail & Related papers (2020-11-24T02:18:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.