MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing
- URL: http://arxiv.org/abs/2306.10012v3
- Date: Wed, 15 May 2024 18:20:28 GMT
- Title: MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing
- Authors: Kai Zhang, Lingbo Mo, Wenhu Chen, Huan Sun, Yu Su,
- Abstract summary: We introduce MagicBrush, the first large-scale, manually annotated dataset for instruction-guided real image editing.
We show that the new model can produce much better images according to human evaluation.
- Score: 48.204992417461575
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-guided image editing is widely needed in daily life, ranging from personal use to professional applications such as Photoshop. However, existing methods are either zero-shot or trained on an automatically synthesized dataset, which contains a high volume of noise. Thus, they still require lots of manual tuning to produce desirable outcomes in practice. To address this issue, we introduce MagicBrush (https://osu-nlp-group.github.io/MagicBrush/), the first large-scale, manually annotated dataset for instruction-guided real image editing that covers diverse scenarios: single-turn, multi-turn, mask-provided, and mask-free editing. MagicBrush comprises over 10K manually annotated triplets (source image, instruction, target image), which supports trainining large-scale text-guided image editing models. We fine-tune InstructPix2Pix on MagicBrush and show that the new model can produce much better images according to human evaluation. We further conduct extensive experiments to evaluate current image editing baselines from multiple dimensions including quantitative, qualitative, and human evaluations. The results reveal the challenging nature of our dataset and the gap between current baselines and real-world editing needs.
Related papers
- AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea [88.79769371584491]
We present AnyEdit, a comprehensive multi-modal instruction editing dataset.
We ensure the diversity and quality of the AnyEdit collection through three aspects: initial data diversity, adaptive editing process, and automated selection of editing results.
Experiments on three benchmark datasets show that AnyEdit consistently boosts the performance of diffusion-based editing models.
arXiv Detail & Related papers (2024-11-24T07:02:56Z) - OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision [32.33777277141083]
We present omniedit, an omnipotent editor to handle seven different image editing tasks with any aspect ratio seamlessly.
omniedit is trained by utilizing the supervision from seven different specialist models to ensure task coverage.
We provide images with different aspect ratios to ensure that our model can handle any image in the wild.
arXiv Detail & Related papers (2024-11-11T18:21:43Z) - Multi-Reward as Condition for Instruction-based Image Editing [32.77114231615961]
We propose to address the training data quality issue with multi-perspective reward data instead of refining the ground-truth image quality.
Experiments indicate that our multi-reward conditioned model outperforms its no-reward counterpart on two popular editing pipelines.
arXiv Detail & Related papers (2024-11-06T05:02:29Z) - UltraEdit: Instruction-based Fine-Grained Image Editing at Scale [43.222251591410455]
This paper presents UltraEdit, a large-scale (approximately 4 million editing samples) automatically generated dataset for instruction-based image editing.
Our key idea is to address the drawbacks in existing image editing datasets like InstructPix2Pix and MagicBrush, and provide a systematic approach to producing massive and high-quality image editing samples.
arXiv Detail & Related papers (2024-07-07T06:50:22Z) - Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation [58.09421301921607]
We construct the first large-scale dataset for subject-driven image editing and generation.
Our dataset is 5 times the size of previous largest dataset, yet our cost is tens of thousands of GPU hours lower.
arXiv Detail & Related papers (2024-06-13T16:40:39Z) - EditWorld: Simulating World Dynamics for Instruction-Following Image Editing [68.6224340373457]
Diffusion models have significantly improved the performance of image editing.
We introduce world-instructed image editing, which defines and categorizes the instructions grounded by various world scenarios.
Our method significantly outperforms existing editing methods in this new task.
arXiv Detail & Related papers (2024-05-23T16:54:17Z) - DM-Align: Leveraging the Power of Natural Language Instructions to Make Changes to Images [55.546024767130994]
We propose a novel model to enhance the text-based control of an image editor by explicitly reasoning about which parts of the image to alter or preserve.
It relies on word alignments between a description of the original source image and the instruction that reflects the needed updates, and the input image.
It is evaluated on a subset of the Bison dataset and a self-defined dataset dubbed Dream.
arXiv Detail & Related papers (2024-04-27T22:45:47Z) - iEdit: Localised Text-guided Image Editing with Weak Supervision [53.082196061014734]
We propose a novel learning method for text-guided image editing.
It generates images conditioned on a source image and a textual edit prompt.
It shows favourable results against its counterparts in terms of image fidelity, CLIP alignment score and qualitatively for editing both generated and real images.
arXiv Detail & Related papers (2023-05-10T07:39:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.