Related papers: SpotEdit: Evaluating Visually-Guided Image Editing Methods

SpotEdit: Evaluating Visually-Guided Image Editing Methods

URL: http://arxiv.org/abs/2508.18159v2
Date: Fri, 26 Sep 2025 19:05:06 GMT
Title: SpotEdit: Evaluating Visually-Guided Image Editing Methods
Authors: Sara Ghazanfari, Wei-An Lin, Haitong Tian, Ersin Yumer,
Abstract summary: SpotEdit is a comprehensive benchmark designed to assess visually-guided image editing methods.<n>Our benchmark includes a dedicated component on hallucination, highlighting how leading models, such as GPT-4o, often hallucinate the existence of a visual cue and erroneously perform the editing task.
Score: 3.5066378196008636
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Visually-guided image editing, where edits are conditioned on both visual cues and textual prompts, has emerged as a powerful paradigm for fine-grained, controllable content generation. Although recent generative models have shown remarkable capabilities, existing evaluations remain simple and insufficiently representative of real-world editing challenges. We present SpotEdit, a comprehensive benchmark designed to systematically assess visually-guided image editing methods across diverse diffusion, autoregressive, and hybrid generative models, uncovering substantial performance disparities. To address a critical yet underexplored challenge, our benchmark includes a dedicated component on hallucination, highlighting how leading models, such as GPT-4o, often hallucinate the existence of a visual cue and erroneously perform the editing task. Our code and benchmark are publicly released at https://github.com/SaraGhazanfari/SpotEdit.

Related papers

WorldEdit: Towards Open-World Image Editing with a Knowledge-Informed Benchmark [72.07273056097722]
We introduce textbfWorldEdit, a dataset specifically designed to enable world-driven image editing.<n>WorldEdit consists of high-quality editing samples, guided by paraphrased instructions that align with real-world causal logic.<n>Our results show that the proposed dataset and methods significantly narrow the gap with GPT-4o and Nano-Banana.
arXiv Detail & Related papers (2026-02-06T13:42:30Z)
Charts Are Not Images: On the Challenges of Scientific Chart Editing [66.38730113476677]
textitFigEdit is a benchmark for scientific figure editing comprising over 30,000 samples.<n>Our benchmark demonstrates the profound limitations of pixel-level manipulation.<n>By releasing textitFigEdit, we aim to enable systematic progress in structure-aware figure editing.
arXiv Detail & Related papers (2025-11-30T06:13:48Z)
Visual Autoregressive Modeling for Instruction-Guided Image Editing [97.04821896251681]
We present a visual autoregressive framework that reframes image editing as a next-scale prediction problem.<n>VarEdit generates multi-scale target features to achieve precise edits.<n>It completes a $512times512$ editing in 1.2 seconds, making it 2.2$times$ faster than the similarly sized UltraEdit.
arXiv Detail & Related papers (2025-08-21T17:59:32Z)
EditInspector: A Benchmark for Evaluation of Text-Guided Image Edits [22.762414256693265]
We introduce EditInspector, a novel benchmark for evaluation of text-guided image edits.<n>We leverage EditInspector to evaluate the performance of state-of-the-art (SoTA) vision and language models in assessing edits.<n>Our findings indicate that current models struggle to evaluate edits comprehensively and frequently hallucinate when describing the changes.
arXiv Detail & Related papers (2025-06-11T17:58:25Z)
Image Editing As Programs with Diffusion Models [69.05164729625052]
We introduce Image Editing As Programs (IEAP), a unified image editing framework built upon the Diffusion Transformer (DiT) architecture.<n>IEAP approaches instructional editing through a reductionist lens, decomposing complex editing instructions into sequences of atomic operations.<n>Our framework delivers superior accuracy and semantic fidelity, particularly for complex, multi-step instructions.
arXiv Detail & Related papers (2025-06-04T16:57:24Z)
Beyond Editing Pairs: Fine-Grained Instructional Image Editing via Multi-Scale Learnable Regions [20.617718631292696]
We develop a novel paradigm for instruction-driven image editing that leverages widely available and enormous text-image pairs.<n>Our approach introduces a multi-scale learnable region to localize and guide the editing process.<n>By treating the alignment between images and their textual descriptions as supervision and learning to generate task-specific editing regions, our method achieves high-fidelity, precise, and instruction-consistent image editing.
arXiv Detail & Related papers (2025-05-25T22:40:59Z)
GIE-Bench: Towards Grounded Evaluation for Text-Guided Image Editing [60.66800567924348]
We introduce a new benchmark designed to evaluate text-guided image editing models.<n>The benchmark includes over 1000 high-quality editing examples across 20 diverse content categories.<n>We conduct a large-scale study comparing GPT-Image-1 against several state-of-the-art editing models.
arXiv Detail & Related papers (2025-05-16T17:55:54Z)
SPICE: A Synergistic, Precise, Iterative, and Customizable Image Editing Workflow [8.850778795270351]
SPICE is a training-free workflow that accepts arbitrary resolutions and aspect ratios, accurately follows user requirements, and improves image quality consistently.<n> SPICE outperforms state-of-the-art baselines on a challenging realistic image-editing dataset.
arXiv Detail & Related papers (2025-04-13T19:13:04Z)
Unified Diffusion-Based Rigid and Non-Rigid Editing with Text and Image Guidance [15.130419159003816]
We present a versatile image editing framework capable of executing both rigid and non-rigid edits. We leverage a dual-path injection scheme to handle diverse editing scenarios. We introduce an integrated self-attention mechanism for fusion of appearance and structural information.
arXiv Detail & Related papers (2024-01-04T08:21:30Z)
AdapEdit: Spatio-Temporal Guided Adaptive Editing Algorithm for Text-Based Continuity-Sensitive Image Editing [24.9487669818162]
We propose atemporal guided adaptive editing algorithm AdapEdit, which realizes adaptive image editing. Our approach has a significant advantage in preserving model priors and does not require model training, fine-tuning extra data, or optimization. We present our results over a wide variety of raw images and editing instructions, demonstrating competitive performance and showing it significantly outperforms the previous approaches.
arXiv Detail & Related papers (2023-12-13T09:45:58Z)
Emu Edit: Precise Image Editing via Recognition and Generation Tasks [62.95717180730946]
We present Emu Edit, a multi-task image editing model which sets state-of-the-art results in instruction-based image editing. We train it to multi-task across an unprecedented range of tasks, such as region-based editing, free-form editing, and Computer Vision tasks. We show that Emu Edit can generalize to new tasks, such as image inpainting, super-resolution, and compositions of editing tasks, with just a few labeled examples.
arXiv Detail & Related papers (2023-11-16T18:55:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.