AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea
- URL: http://arxiv.org/abs/2411.15738v1
- Date: Sun, 24 Nov 2024 07:02:56 GMT
- Title: AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea
- Authors: Qifan Yu, Wei Chow, Zhongqi Yue, Kaihang Pan, Yang Wu, Xiaoyang Wan, Juncheng Li, Siliang Tang, Hanwang Zhang, Yueting Zhuang,
- Abstract summary: We present AnyEdit, a comprehensive multi-modal instruction editing dataset.
We ensure the diversity and quality of the AnyEdit collection through three aspects: initial data diversity, adaptive editing process, and automated selection of editing results.
Experiments on three benchmark datasets show that AnyEdit consistently boosts the performance of diffusion-based editing models.
- Score: 88.79769371584491
- License:
- Abstract: Instruction-based image editing aims to modify specific image elements with natural language instructions. However, current models in this domain often struggle to accurately execute complex user instructions, as they are trained on low-quality data with limited editing types. We present AnyEdit, a comprehensive multi-modal instruction editing dataset, comprising 2.5 million high-quality editing pairs spanning over 20 editing types and five domains. We ensure the diversity and quality of the AnyEdit collection through three aspects: initial data diversity, adaptive editing process, and automated selection of editing results. Using the dataset, we further train a novel AnyEdit Stable Diffusion with task-aware routing and learnable task embedding for unified image editing. Comprehensive experiments on three benchmark datasets show that AnyEdit consistently boosts the performance of diffusion-based editing models. This presents prospects for developing instruction-driven image editing models that support human creativity.
Related papers
- ReEdit: Multimodal Exemplar-Based Image Editing with Diffusion Models [11.830273909934688]
Modern Text-to-Image (T2I) Diffusion models have revolutionized image editing by enabling the generation of high-quality images.
We propose ReEdit, a modular and efficient end-to-end framework that captures edits in both text and image modalities.
Our results demonstrate that ReEdit consistently outperforms contemporary approaches both qualitatively and quantitatively.
arXiv Detail & Related papers (2024-11-06T15:19:24Z) - UltraEdit: Instruction-based Fine-Grained Image Editing at Scale [43.222251591410455]
This paper presents UltraEdit, a large-scale (approximately 4 million editing samples) automatically generated dataset for instruction-based image editing.
Our key idea is to address the drawbacks in existing image editing datasets like InstructPix2Pix and MagicBrush, and provide a systematic approach to producing massive and high-quality image editing samples.
arXiv Detail & Related papers (2024-07-07T06:50:22Z) - SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional Image Editing [53.00272278754867]
SEED-Data-Edit is a hybrid dataset for instruction-guided image editing.
High-quality editing data produced by an automated pipeline.
Real-world scenario data collected from the internet.
High-precision multi-turn editing data annotated by humans.
arXiv Detail & Related papers (2024-05-07T04:55:47Z) - Customize your NeRF: Adaptive Source Driven 3D Scene Editing via
Local-Global Iterative Training [61.984277261016146]
We propose a CustomNeRF model that unifies a text description or a reference image as the editing prompt.
To tackle the first challenge, we propose a Local-Global Iterative Editing (LGIE) training scheme that alternates between foreground region editing and full-image editing.
For the second challenge, we also design a class-guided regularization that exploits class priors within the generation model to alleviate the inconsistency problem.
arXiv Detail & Related papers (2023-12-04T06:25:06Z) - Optimisation-Based Multi-Modal Semantic Image Editing [58.496064583110694]
We propose an inference-time editing optimisation to accommodate multiple editing instruction types.
By allowing to adjust the influence of each loss function, we build a flexible editing solution that can be adjusted to user preferences.
We evaluate our method using text, pose and scribble edit conditions, and highlight our ability to achieve complex edits.
arXiv Detail & Related papers (2023-11-28T15:31:11Z) - Emu Edit: Precise Image Editing via Recognition and Generation Tasks [62.95717180730946]
We present Emu Edit, a multi-task image editing model which sets state-of-the-art results in instruction-based image editing.
We train it to multi-task across an unprecedented range of tasks, such as region-based editing, free-form editing, and Computer Vision tasks.
We show that Emu Edit can generalize to new tasks, such as image inpainting, super-resolution, and compositions of editing tasks, with just a few labeled examples.
arXiv Detail & Related papers (2023-11-16T18:55:58Z) - Object-aware Inversion and Reassembly for Image Editing [61.19822563737121]
We propose Object-aware Inversion and Reassembly (OIR) to enable object-level fine-grained editing.
We use our search metric to find the optimal inversion step for each editing pair when editing an image.
Our method achieves superior performance in editing object shapes, colors, materials, categories, etc., especially in multi-object editing scenarios.
arXiv Detail & Related papers (2023-10-18T17:59:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.