Related papers: SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional Image Editing

SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional Image Editing

URL: http://arxiv.org/abs/2405.04007v1
Date: Tue, 7 May 2024 04:55:47 GMT
Title: SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional Image Editing
Authors: Yuying Ge, Sijie Zhao, Chen Li, Yixiao Ge, Ying Shan,
Abstract summary: SEED-Data-Edit is a hybrid dataset for instruction-guided image editing. High-quality editing data produced by an automated pipeline. Real-world scenario data collected from the internet. High-precision multi-turn editing data annotated by humans.
Score: 53.00272278754867
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: In this technical report, we introduce SEED-Data-Edit: a unique hybrid dataset for instruction-guided image editing, which aims to facilitate image manipulation using open-form language. SEED-Data-Edit is composed of three distinct types of data: (1) High-quality editing data produced by an automated pipeline, ensuring a substantial volume of diverse image editing pairs. (2) Real-world scenario data collected from the internet, which captures the intricacies of user intentions for promoting the practical application of image editing in the real world. (3) High-precision multi-turn editing data annotated by humans, which involves multiple rounds of edits for simulating iterative editing processes. The combination of these diverse data sources makes SEED-Data-Edit a comprehensive and versatile dataset for training language-guided image editing model. We fine-tune a pretrained Multimodal Large Language Model (MLLM) that unifies comprehension and generation with SEED-Data-Edit. The instruction tuned model demonstrates promising results, indicating the potential and effectiveness of SEED-Data-Edit in advancing the field of instructional image editing. The datasets are released in https://huggingface.co/datasets/AILab-CVC/SEED-Data-Edit.

Related papers

MultiEdit: Advancing Instruction-based Image Editing on Diverse and Challenging Tasks [46.87912659985628]
MultiEdit is a comprehensive dataset featuring over 107K high-quality image editing samples.<n>It encompasses 6 challenging editing tasks through a diverse collection of 18 non-style-transfer editing types and 38 style transfer operations.<n>We employ a novel dataset construction pipeline that utilizes two multi-modal large language models (MLLMs) to generate visual-adaptive editing instructions.
arXiv Detail & Related papers (2025-09-18T05:33:38Z)
ImgEdit: A Unified Image Editing Dataset and Benchmark [14.185771939071149]
We introduce ImgEdit, a large-scale, high-quality image-editing dataset comprising 1.2 million carefully curated edit pairs.<n>ImgEdit surpasses existing datasets in both task novelty and data quality.<n>For comprehensive evaluation, we introduce ImgEdit-Bench, a benchmark designed to evaluate image editing performance.
arXiv Detail & Related papers (2025-05-26T17:53:33Z)
Beyond Editing Pairs: Fine-Grained Instructional Image Editing via Multi-Scale Learnable Regions [20.617718631292696]
We develop a novel paradigm for instruction-driven image editing that leverages widely available and enormous text-image pairs.<n>Our approach introduces a multi-scale learnable region to localize and guide the editing process.<n>By treating the alignment between images and their textual descriptions as supervision and learning to generate task-specific editing regions, our method achieves high-fidelity, precise, and instruction-consistent image editing.
arXiv Detail & Related papers (2025-05-25T22:40:59Z)
AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea [88.79769371584491]
We present AnyEdit, a comprehensive multi-modal instruction editing dataset. We ensure the diversity and quality of the AnyEdit collection through three aspects: initial data diversity, adaptive editing process, and automated selection of editing results. Experiments on three benchmark datasets show that AnyEdit consistently boosts the performance of diffusion-based editing models.
arXiv Detail & Related papers (2024-11-24T07:02:56Z)
ReEdit: Multimodal Exemplar-Based Image Editing with Diffusion Models [11.830273909934688]
Modern Text-to-Image (T2I) Diffusion models have revolutionized image editing by enabling the generation of high-quality images. We propose ReEdit, a modular and efficient end-to-end framework that captures edits in both text and image modalities. Our results demonstrate that ReEdit consistently outperforms contemporary approaches both qualitatively and quantitatively.
arXiv Detail & Related papers (2024-11-06T15:19:24Z)
UltraEdit: Instruction-based Fine-Grained Image Editing at Scale [43.222251591410455]
This paper presents UltraEdit, a large-scale (approximately 4 million editing samples) automatically generated dataset for instruction-based image editing. Our key idea is to address the drawbacks in existing image editing datasets like InstructPix2Pix and MagicBrush, and provide a systematic approach to producing massive and high-quality image editing samples.
arXiv Detail & Related papers (2024-07-07T06:50:22Z)
SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models [91.22477798288003]
This paper introduces SmartEdit, a novel approach to instruction-based image editing. It exploits Multimodal Large Language Models (MLLMs) to enhance their understanding and reasoning capabilities. We show that a small amount of complex instruction editing data can effectively stimulate SmartEdit's editing capabilities for more complex instructions.
arXiv Detail & Related papers (2023-12-11T17:54:11Z)
LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance [0.0]
LEDITS is a combined lightweight approach for real-image editing, incorporating the Edit Friendly DDPM inversion technique with Semantic Guidance. This approach achieves versatile edits, both subtle and extensive as well as alterations in composition and style, while requiring no optimization nor extensions to the architecture.
arXiv Detail & Related papers (2023-07-02T09:11:09Z)
iEdit: Localised Text-guided Image Editing with Weak Supervision [53.082196061014734]
We propose a novel learning method for text-guided image editing. It generates images conditioned on a source image and a textual edit prompt. It shows favourable results against its counterparts in terms of image fidelity, CLIP alignment score and qualitatively for editing both generated and real images.
arXiv Detail & Related papers (2023-05-10T07:39:14Z)
InstructPix2Pix: Learning to Follow Image Editing Instructions [103.77092910685764]
We propose a method for editing images from human instructions. given an input image and a written instruction that tells the model what to do, our model follows these instructions to edit the image. We show compelling editing results for a diverse collection of input images and written instructions.
arXiv Detail & Related papers (2022-11-17T18:58:43Z)
EditGAN: High-Precision Semantic Image Editing [120.49401527771067]
EditGAN is a novel method for high quality, high precision semantic image editing. We show that EditGAN can manipulate images with an unprecedented level of detail and freedom. We can also easily combine multiple edits and perform plausible edits beyond EditGAN training data.
arXiv Detail & Related papers (2021-11-04T22:36:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.