Related papers: The Consistency Critic: Correcting Inconsistencies in Generated Images via Reference-Guided Attentive Alignment

The Consistency Critic: Correcting Inconsistencies in Generated Images via Reference-Guided Attentive Alignment

URL: http://arxiv.org/abs/2511.20614v1
Date: Tue, 25 Nov 2025 18:40:25 GMT
Title: The Consistency Critic: Correcting Inconsistencies in Generated Images via Reference-Guided Attentive Alignment
Authors: Ziheng Ouyang, Yiren Song, Yaoli Liu, Shihao Zhu, Qibin Hou, Ming-Ming Cheng, Mike Zheng Shou,
Abstract summary: ImageCritic can be integrated into an agent framework to automatically detect inconsistencies and correct them with multi-round and local editing.<n>In experiments, ImageCritic can effectively resolve detail-related issues in various customized generation scenarios, providing significant improvements over existing methods.
Score: 105.31858867473845
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Previous works have explored various customized generation tasks given a reference image, but they still face limitations in generating consistent fine-grained details. In this paper, our aim is to solve the inconsistency problem of generated images by applying a reference-guided post-editing approach and present our ImageCritic. We first construct a dataset of reference-degraded-target triplets obtained via VLM-based selection and explicit degradation, which effectively simulates the common inaccuracies or inconsistencies observed in existing generation models. Furthermore, building on a thorough examination of the model's attention mechanisms and intrinsic representations, we accordingly devise an attention alignment loss and a detail encoder to precisely rectify inconsistencies. ImageCritic can be integrated into an agent framework to automatically detect inconsistencies and correct them with multi-round and local editing in complex scenarios. Extensive experiments demonstrate that ImageCritic can effectively resolve detail-related issues in various customized generation scenarios, providing significant improvements over existing methods.

Related papers

Re-Align: Structured Reasoning-guided Alignment for In-Context Image Generation and Editing [38.240269144736224]
Re-Align bridges the gap between understanding and generation through structured reasoning-guided alignment.<n>In-context image generation and editing (ICGE) enables users to specify visual concepts through interleaved image-text prompts.
arXiv Detail & Related papers (2026-01-08T17:13:00Z)
OmniRefiner: Reinforcement-Guided Local Diffusion Refinement [10.329465965964571]
VAE-based latent compression discards subtle texture information, causing identity- and attribute-specific cues to vanish.<n>We introduce ourMthd, a detail-aware refinement framework that performs two consecutive stages of reference-driven correction.<n>Experiments demonstrate that ourMthd significantly improves reference alignment and fine-grained detail preservation.
arXiv Detail & Related papers (2025-11-25T06:57:49Z)
UniREditBench: A Unified Reasoning-based Image Editing Benchmark [52.54256348710893]
This work proposes UniREditBench, a unified benchmark for reasoning-based image editing evaluation.<n>It comprises 2,700 meticulously curated samples, covering both real- and game-world scenarios across 8 primary dimensions and 18 sub-dimensions.<n>We fine-tune Bagel on this dataset and develop UniREdit-Bagel, demonstrating substantial improvements in both in-domain and out-of-distribution settings.
arXiv Detail & Related papers (2025-11-03T07:24:57Z)
Leveraging Hierarchical Image-Text Misalignment for Universal Fake Image Detection [58.927873049646024]
We show that fake images cannot be properly aligned with corresponding captions compared to real images.<n>We propose a simple yet effective ITEM by leveraging the image-text misalignment in a joint visual-language space as discriminative clues.
arXiv Detail & Related papers (2025-11-01T06:51:14Z)
EditInfinity: Image Editing with Binary-Quantized Generative Models [64.05135380710749]
We investigate the parameter-efficient adaptation of binary-quantized generative models for image editing.<n>Specifically, we propose EditInfinity, which adapts emphInfinity, a binary-quantized generative model, for image editing.<n>We propose an efficient yet effective image inversion mechanism that integrates text prompting rectification and image style preservation.
arXiv Detail & Related papers (2025-10-23T05:06:24Z)
G4Seg: Generation for Inexact Segmentation Refinement with Diffusion Models [38.44872934965588]
This paper considers the problem of utilizing a large-scale text-to-image model to tackle the Inexact diffusion (IS) task.<n>We exploit the pattern discrepancies between original images and mask-conditional generated images to facilitate a coarse-to-fine segmentation refinement.
arXiv Detail & Related papers (2025-06-02T11:05:28Z)
From Controlled Scenarios to Real-World: Cross-Domain Degradation Pattern Matching for All-in-One Image Restoration [2.997052569698842]
All-in-One Image Restoration (AiOIR) aims to achieve image restoration caused by multiple degradation patterns via a single model with unified parameters.<n>UDAIR framework is proposed to effectively achieve AiOIR by leveraging the learned knowledge from source domain to target domain.<n> Experimental results on 10 open-source datasets demonstrate that UDAIR achieves new state-of-the-art performance for the AiOIR task.
arXiv Detail & Related papers (2025-05-28T12:22:00Z)
Instilling Multi-round Thinking to Text-guided Image Generation [72.2032630115201]
Single-round generation often overlooks crucial details, particularly in the realm of fine-grained changes like shoes or sleeves. We introduce a new self-supervised regularization, ie, multi-round regularization, which is compatible with existing methods. It builds upon the observation that the modification order generally should not affect the final result.
arXiv Detail & Related papers (2024-01-16T16:19:58Z)
DiG-IN: Diffusion Guidance for Investigating Networks -- Uncovering Classifier Differences Neuron Visualisations and Visual Counterfactual Explanations [35.458709912618176]
Deep learning has led to huge progress in complex image classification tasks like ImageNet, unexpected failure modes, e.g. via spurious features. For safety-critical tasks the black-box nature of their decisions is problematic, and explanations or at least methods which make decisions plausible are needed urgently. We address these problems by generating images that optimize a classifier-derived objective using a framework for guided image generation.
arXiv Detail & Related papers (2023-11-29T17:35:29Z)
Unsupervised Discovery of Disentangled Manifolds in GANs [74.24771216154105]
Interpretable generation process is beneficial to various image editing applications. We propose a framework to discover interpretable directions in the latent space given arbitrary pre-trained generative adversarial networks.
arXiv Detail & Related papers (2020-11-24T02:18:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.