RefEdit: A Benchmark and Method for Improving Instruction-based Image Editing Model on Referring Expressions
- URL: http://arxiv.org/abs/2506.03448v1
- Date: Tue, 03 Jun 2025 23:20:24 GMT
- Title: RefEdit: A Benchmark and Method for Improving Instruction-based Image Editing Model on Referring Expressions
- Authors: Bimsara Pathiraja, Maitreya Patel, Shivam Singh, Yezhou Yang, Chitta Baral,
- Abstract summary: We introduce RefEdit -- an instruction-based editing model trained on our scalable synthetic data generation pipeline.<n>Our RefEdit, trained on only 20,000 editing triplets, outperforms the Flux/SD3 model-based baselines trained on millions of data.
- Score: 56.9437856499838
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite recent advances in inversion and instruction-based image editing, existing approaches primarily excel at editing single, prominent objects but significantly struggle when applied to complex scenes containing multiple entities. To quantify this gap, we first introduce RefEdit-Bench, a rigorous real-world benchmark rooted in RefCOCO, where even baselines trained on millions of samples perform poorly. To overcome this limitation, we introduce RefEdit -- an instruction-based editing model trained on our scalable synthetic data generation pipeline. Our RefEdit, trained on only 20,000 editing triplets, outperforms the Flux/SD3 model-based baselines trained on millions of data. Extensive evaluations across various benchmarks demonstrate that our model not only excels in referring expression tasks but also enhances performance on traditional benchmarks, achieving state-of-the-art results comparable to closed-source methods. We release data \& checkpoint for reproducibility.
Related papers
- DualEdit: Dual Editing for Knowledge Updating in Vision-Language Models [26.762431651154607]
We propose DualEdit, an editor that modifies both textual and visual modalities at their respective key layers.<n>We evaluate DualEdit across multiple VLM backbones and benchmark datasets, demonstrating its superiority over state-of-the-art VLM editing baselines.
arXiv Detail & Related papers (2025-06-16T16:04:16Z) - FineEdit: Unlock Instruction-Based Text Editing for LLMs [9.795246551841586]
FineEdit is a specialized editing model explicitly trained for accurate, context-aware text modifications.<n>FineEdit outperforms state-of-the-art models on single-turn edits, up to 30% over Llama-3.2-3B, and exceeding Mistral-7B-OpenOrca performance by over 40% on direct editing tasks.
arXiv Detail & Related papers (2025-02-19T01:41:44Z) - The Mirage of Model Editing: Revisiting Evaluation in the Wild [70.17413507444704]
We introduce QAEdit, a new benchmark aligned with widely used question answering (QA) datasets, and WILD, a task-agnostic evaluation framework.<n>Our single editing experiments show that current editing methods perform substantially worse than previously reported.
arXiv Detail & Related papers (2025-02-16T15:57:55Z) - ComprehendEdit: A Comprehensive Dataset and Evaluation Framework for Multimodal Knowledge Editing [27.034072044001736]
Large multimodal language models (MLLMs) have revolutionized natural language processing and visual understanding.<n>Current knowledge editing evaluations are limited in scope and potentially biased.<n>We introduce ComprehendEdit, a comprehensive benchmark comprising eight diverse tasks from multiple datasets.
arXiv Detail & Related papers (2024-12-17T11:41:49Z) - Consecutive Batch Model Editing with HooK Layers [59.673084839708224]
CoachHooK is a model editing method that simultaneously supports sequential and batch editing.
It is memory-friendly as it only needs a small amount of it to store several hook layers whose size remains unchanged over time.
arXiv Detail & Related papers (2024-03-08T14:07:44Z) - The Butterfly Effect of Model Editing: Few Edits Can Trigger Large Language Models Collapse [58.0132400208411]
Even a single edit can trigger model collapse, manifesting as significant performance degradation in various benchmark tasks.
benchmarking Large Language Models after each edit is impractically time-consuming and resource-intensive.
We have utilized GPT-3.5 to develop a new dataset, HardEdit, based on hard cases.
arXiv Detail & Related papers (2024-02-15T01:50:38Z) - EditEval: An Instruction-Based Benchmark for Text Improvements [73.5918084416016]
This work presents EditEval: An instruction-based, benchmark and evaluation suite for automatic evaluation of editing capabilities.
We evaluate several pre-trained models, which shows that InstructGPT and PEER perform the best, but that most baselines fall below the supervised SOTA.
Our analysis shows that commonly used metrics for editing tasks do not always correlate well, and that optimization for prompts with the highest performance does not necessarily entail the strongest robustness to different models.
arXiv Detail & Related papers (2022-09-27T12:26:05Z) - Value Retrieval with Arbitrary Queries for Form-like Documents [50.5532781148902]
We propose value retrieval with arbitrary queries for form-like documents.
Our method predicts target value for an arbitrary query based on the understanding of layout and semantics of a form.
We propose a simple document language modeling (simpleDLM) strategy to improve document understanding on large-scale model pre-training.
arXiv Detail & Related papers (2021-12-15T01:12:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.