WiseEdit: Benchmarking Cognition- and Creativity-Informed Image Editing
- URL: http://arxiv.org/abs/2512.00387v2
- Date: Mon, 08 Dec 2025 15:05:52 GMT
- Title: WiseEdit: Benchmarking Cognition- and Creativity-Informed Image Editing
- Authors: Kaihang Pan, Weile Chen, Haiyi Qiu, Qifan Yu, Wendong Bu, Zehan Wang, Yun Zhu, Juncheng Li, Siliang Tang,
- Abstract summary: WiseEdit is a knowledge-intensive benchmark for comprehensive evaluation of cognition- and creativity-informed image editing.<n>WiseEdit decomposes image editing into three cascaded steps, each corresponding to a task that poses a challenge for models to complete.<n>Ultimately, WiseEdit comprises 1,220 test cases, objectively revealing the limitations of SoTA image editing models.
- Score: 39.431195153927334
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent image editing models boast next-level intelligent capabilities, facilitating cognition- and creativity-informed image editing. Yet, existing benchmarks provide too narrow a scope for evaluation, failing to holistically assess these advanced abilities. To address this, we introduce WiseEdit, a knowledge-intensive benchmark for comprehensive evaluation of cognition- and creativity-informed image editing, featuring deep task depth and broad knowledge breadth. Drawing an analogy to human cognitive creation, WiseEdit decomposes image editing into three cascaded steps, i.e., Awareness, Interpretation, and Imagination, each corresponding to a task that poses a challenge for models to complete at the specific step. It also encompasses complex tasks, where none of the three steps can be finished easily. Furthermore, WiseEdit incorporates three fundamental types of knowledge: Declarative, Procedural, and Metacognitive knowledge. Ultimately, WiseEdit comprises 1,220 test cases, objectively revealing the limitations of SoTA image editing models in knowledge-based cognitive reasoning and creative composition capabilities. The benchmark, evaluation code, and the generated images of each model will be made publicly available soon. Project Page: https://qnancy.github.io/wiseedit_project_page/.
Related papers
- PhotoAgent: Agentic Photo Editing with Exploratory Visual Aesthetic Planning [26.368648607025676]
PhotoAgent is a system that advances image editing through explicit aesthetic planning.<n>It reasons over user aesthetic intent, plans multi-step editing actions via tree search, and iteratively refines results through closed-loop execution.<n>In experiments, PhotoAgent consistently improves both instruction adherence and visual quality compared with baseline methods.
arXiv Detail & Related papers (2026-02-26T09:46:06Z) - EditThinker: Unlocking Iterative Reasoning for Any Image Editor [72.28251670314451]
We propose a deliberative editing framework to 'think' while they edit.<n>We train a single MLLM, EditThinker, to act as the reasoning engine of this framework.<n>We employ reinforcement learning to align the EditThinker's thinking with its editing, thereby generating more targeted instruction improvements.
arXiv Detail & Related papers (2025-12-05T18:58:09Z) - I2I-Bench: A Comprehensive Benchmark Suite for Image-to-Image Editing Models [78.62380562116135]
Existing image editing benchmarks suffer from limited task scopes, insufficient evaluation dimensions, and heavy reliance on manual annotations.<n>We propose textbfI2I-Bench, a comprehensive benchmark for image-to-image editing models, which features 10 task categories across both single-image and multi-image editing tasks.<n>Using I2I-Bench, we benchmark numerous mainstream image editing models, investigating the gaps and trade-offs between editing models across various dimensions.
arXiv Detail & Related papers (2025-12-04T10:44:07Z) - SpotEdit: Evaluating Visually-Guided Image Editing Methods [3.5066378196008636]
SpotEdit is a comprehensive benchmark designed to assess visually-guided image editing methods.<n>Our benchmark includes a dedicated component on hallucination, highlighting how leading models, such as GPT-4o, often hallucinate the existence of a visual cue and erroneously perform the editing task.
arXiv Detail & Related papers (2025-08-25T16:08:57Z) - KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models [88.58758610679762]
We introduce KRIS-Bench (Knowledge-based Reasoning in Image-editing Systems Benchmark), a diagnostic benchmark designed to assess models through a cognitively informed lens.<n>We categorize editing tasks across three foundational knowledge types: Factual, Conceptual, and Procedural.<n>To support fine-grained evaluation, we propose a protocol that incorporates a novel Knowledge Plausibility metric, enhanced by knowledge hints and calibrated through human studies.
arXiv Detail & Related papers (2025-05-22T14:08:59Z) - CompBench: Benchmarking Complex Instruction-guided Image Editing [63.347846732450364]
CompBench is a large-scale benchmark for complex instruction-guided image editing.<n>We propose an MLLM-human collaborative framework with tailored task pipelines.<n>We propose an instruction decoupling strategy that disentangles editing intents into four key dimensions.
arXiv Detail & Related papers (2025-05-18T02:30:52Z) - Learning Action and Reasoning-Centric Image Editing from Videos and Simulations [45.637947364341436]
AURORA dataset is a collection of high-quality training data, human-annotated and curated from videos and simulation engines.
We evaluate an AURORA-finetuned model on a new expert-curated benchmark covering 8 diverse editing tasks.
Our model significantly outperforms previous editing models as judged by human raters.
arXiv Detail & Related papers (2024-07-03T19:36:33Z) - Responsible Visual Editing [53.45295657891099]
We formulate a new task, responsible visual editing, which entails modifying specific concepts within an image to render it more responsible while minimizing changes.
To mitigate the negative implications of harmful images on research, we create a transparent and public dataset, AltBear, which expresses harmful information using teddy bears instead of humans.
We find that the AltBear dataset corresponds well to the harmful content found in real images, offering a consistent experimental evaluation.
arXiv Detail & Related papers (2024-04-08T14:56:26Z) - Emu Edit: Precise Image Editing via Recognition and Generation Tasks [62.95717180730946]
We present Emu Edit, a multi-task image editing model which sets state-of-the-art results in instruction-based image editing.
We train it to multi-task across an unprecedented range of tasks, such as region-based editing, free-form editing, and Computer Vision tasks.
We show that Emu Edit can generalize to new tasks, such as image inpainting, super-resolution, and compositions of editing tasks, with just a few labeled examples.
arXiv Detail & Related papers (2023-11-16T18:55:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.