Related papers: UltraEdit: Instruction-based Fine-Grained Image Editing at Scale

UltraEdit: Instruction-based Fine-Grained Image Editing at Scale

URL: http://arxiv.org/abs/2407.05282v1
Date: Sun, 7 Jul 2024 06:50:22 GMT
Title: UltraEdit: Instruction-based Fine-Grained Image Editing at Scale
Authors: Haozhe Zhao, Xiaojian Ma, Liang Chen, Shuzheng Si, Rujie Wu, Kaikai An, Peiyu Yu, Minjia Zhang, Qing Li, Baobao Chang,
Abstract summary: This paper presents UltraEdit, a large-scale (approximately 4 million editing samples) automatically generated dataset for instruction-based image editing. Our key idea is to address the drawbacks in existing image editing datasets like InstructPix2Pix and MagicBrush, and provide a systematic approach to producing massive and high-quality image editing samples.
Score: 43.222251591410455
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper presents UltraEdit, a large-scale (approximately 4 million editing samples), automatically generated dataset for instruction-based image editing. Our key idea is to address the drawbacks in existing image editing datasets like InstructPix2Pix and MagicBrush, and provide a systematic approach to producing massive and high-quality image editing samples. UltraEdit offers several distinct advantages: 1) It features a broader range of editing instructions by leveraging the creativity of large language models (LLMs) alongside in-context editing examples from human raters; 2) Its data sources are based on real images, including photographs and artworks, which provide greater diversity and reduced bias compared to datasets solely generated by text-to-image models; 3) It also supports region-based editing, enhanced by high-quality, automatically produced region annotations. Our experiments show that canonical diffusion-based editing baselines trained on UltraEdit set new records on MagicBrush and Emu-Edit benchmarks. Our analysis further confirms the crucial role of real image anchors and region-based editing data. The dataset, code, and models can be found in https://ultra-editing.github.io.

Related papers

EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning [58.53074381801114]
We introduce EditVerse, a unified framework for image and video generation and editing within a single model.<n>By representing all modalities, i.e. text, image, and video, as a unified token sequence, EditVerse leverages self-attention to achieve robust in-context learning.<n>We present EditVerseBench, the first benchmark for instruction-based video editing covering diverse tasks and resolutions.
arXiv Detail & Related papers (2025-09-24T17:59:30Z)
Towards Efficient Exemplar Based Image Editing with Multimodal VLMs [11.830273909934688]
In this work, we tackle the task of transferring an edit from an exemplar pair to a content image(s) by leveraging text-to-image diffusion models and multimodal VLMs.<n>Our end-to-end pipeline is optimization-free, but our experiments demonstrate that it still outperforms baselines on multiple types of edits while being 4x faster.
arXiv Detail & Related papers (2025-06-25T06:20:36Z)
Beyond Editing Pairs: Fine-Grained Instructional Image Editing via Multi-Scale Learnable Regions [20.617718631292696]
We develop a novel paradigm for instruction-driven image editing that leverages widely available and enormous text-image pairs.<n>Our approach introduces a multi-scale learnable region to localize and guide the editing process.<n>By treating the alignment between images and their textual descriptions as supervision and learning to generate task-specific editing regions, our method achieves high-fidelity, precise, and instruction-consistent image editing.
arXiv Detail & Related papers (2025-05-25T22:40:59Z)
Step1X-Edit: A Practical Framework for General Image Editing [64.07202539610576]
We release a state-of-the-art image editing model, called Step1X-Edit. It can provide comparable performance against the closed-source models like GPT-4o and Gemini2 Flash. For evaluation, we develop the GEdit-Bench, a novel benchmark rooted in real-world user instructions.
arXiv Detail & Related papers (2025-04-24T17:25:12Z)
AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea [88.79769371584491]
We present AnyEdit, a comprehensive multi-modal instruction editing dataset. We ensure the diversity and quality of the AnyEdit collection through three aspects: initial data diversity, adaptive editing process, and automated selection of editing results. Experiments on three benchmark datasets show that AnyEdit consistently boosts the performance of diffusion-based editing models.
arXiv Detail & Related papers (2024-11-24T07:02:56Z)
ReEdit: Multimodal Exemplar-Based Image Editing with Diffusion Models [11.830273909934688]
Modern Text-to-Image (T2I) Diffusion models have revolutionized image editing by enabling the generation of high-quality images. We propose ReEdit, a modular and efficient end-to-end framework that captures edits in both text and image modalities. Our results demonstrate that ReEdit consistently outperforms contemporary approaches both qualitatively and quantitatively.
arXiv Detail & Related papers (2024-11-06T15:19:24Z)
SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional Image Editing [53.00272278754867]
SEED-Data-Edit is a hybrid dataset for instruction-guided image editing. High-quality editing data produced by an automated pipeline. Real-world scenario data collected from the internet. High-precision multi-turn editing data annotated by humans.
arXiv Detail & Related papers (2024-05-07T04:55:47Z)
DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing [66.43179841884098]
Large-scale Text-to-Image (T2I) diffusion models have revolutionized image generation over the last few years. We propose DiffEditor to rectify two weaknesses in existing diffusion-based image editing. Our method can efficiently achieve state-of-the-art performance on various fine-grained image editing tasks.
arXiv Detail & Related papers (2024-02-04T18:50:29Z)
SpaceEdit: Learning a Unified Editing Space for Open-Domain Image Editing [94.31103255204933]
We propose a unified model for open-domain image editing focusing on color and tone adjustment of open-domain images. Our model learns a unified editing space that is more semantic, intuitive, and easy to manipulate. We show that by inverting image pairs into latent codes of the learned editing space, our model can be leveraged for various downstream editing tasks.
arXiv Detail & Related papers (2021-11-30T23:53:32Z)
EditGAN: High-Precision Semantic Image Editing [120.49401527771067]
EditGAN is a novel method for high quality, high precision semantic image editing. We show that EditGAN can manipulate images with an unprecedented level of detail and freedom. We can also easily combine multiple edits and perform plausible edits beyond EditGAN training data.
arXiv Detail & Related papers (2021-11-04T22:36:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.