X2Edit: Revisiting Arbitrary-Instruction Image Editing through Self-Constructed Data and Task-Aware Representation Learning
- URL: http://arxiv.org/abs/2508.07607v2
- Date: Sun, 09 Nov 2025 01:42:53 GMT
- Title: X2Edit: Revisiting Arbitrary-Instruction Image Editing through Self-Constructed Data and Task-Aware Representation Learning
- Authors: Jian Ma, Xujie Zhu, Zihao Pan, Qirong Peng, Xu Guo, Chen Chen, Haonan Lu,
- Abstract summary: X2Edit dataset is a comprehensive dataset covering 14 diverse editing tasks.<n>We construct 3.7 million high-quality data with balanced categories.<n>Experiments demonstrate that the model's editing performance is competitive among many excellent models.
- Score: 14.005947068255546
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing open-source datasets for arbitrary-instruction image editing remain suboptimal, while a plug-and-play editing module compatible with community-prevalent generative models is notably absent. In this paper, we first introduce the X2Edit Dataset, a comprehensive dataset covering 14 diverse editing tasks, including subject-driven generation. We utilize the industry-leading unified image generation models and expert models to construct the data. Meanwhile, we design reasonable editing instructions with the VLM and implement various scoring mechanisms to filter the data. As a result, we construct 3.7 million high-quality data with balanced categories. Second, to better integrate seamlessly with community image generation models, we design task-aware MoE-LoRA training based on FLUX.1, with only 8\% of the parameters of the full model. To further improve the final performance, we utilize the internal representations of the diffusion model and define positive/negative samples based on image editing types to introduce contrastive learning. Extensive experiments demonstrate that the model's editing performance is competitive among many excellent models. Additionally, the constructed dataset exhibits substantial advantages over existing open-source datasets. The open-source code, checkpoints, and datasets for X2Edit can be found at the following link: https://github.com/OPPO-Mente-Lab/X2Edit.
Related papers
- Learning an Image Editing Model without Image Editing Pairs [83.03646586929638]
Recent image editing models have achieved impressive results while following natural language editing instructions.<n>They rely on supervised fine-tuning with large datasets of input-target pairs.<n>Current workarounds use synthetic training pairs that leverage the zero-shot capabilities of existing models.<n>We present a new training paradigm that eliminates the need for paired data entirely.
arXiv Detail & Related papers (2025-10-16T17:59:57Z) - Draw-In-Mind: Rebalancing Designer-Painter Roles in Unified Multimodal Models Benefits Image Editing [53.197392152109636]
We introduce Draw-In-Mind (DIM), a dataset consisting of two complementary subsets: (i) DIM-T2I, containing 14M long-context image-text pairs to enhance complex instruction comprehension; and (ii) DIM-Edit, consisting of 233K chain-of-thought imaginations generated by GPT-4o, serving as explicit design blueprints for image edits.<n>DIM-4.6B-T2I/Edit achieves competitive performance on the ImgEdit and GEdit-Bench benchmarks, outperforming much larger models such as UniWorld-V1 and Step1X-Edit.
arXiv Detail & Related papers (2025-09-02T06:06:52Z) - ImgEdit: A Unified Image Editing Dataset and Benchmark [14.185771939071149]
We introduce ImgEdit, a large-scale, high-quality image-editing dataset comprising 1.2 million carefully curated edit pairs.<n>ImgEdit surpasses existing datasets in both task novelty and data quality.<n>For comprehensive evaluation, we introduce ImgEdit-Bench, a benchmark designed to evaluate image editing performance.
arXiv Detail & Related papers (2025-05-26T17:53:33Z) - Step1X-Edit: A Practical Framework for General Image Editing [64.07202539610576]
We release a state-of-the-art image editing model, called Step1X-Edit.<n>It can provide comparable performance against the closed-source models like GPT-4o and Gemini2 Flash.<n>For evaluation, we develop the GEdit-Bench, a novel benchmark rooted in real-world user instructions.
arXiv Detail & Related papers (2025-04-24T17:25:12Z) - DreamOmni: Unified Image Generation and Editing [51.45871494724542]
We introduce Dream Omni, a unified model for image generation and editing.<n>For training, Dream Omni jointly trains T2I generation and downstream tasks.<n>This collaboration significantly boosts editing performance.
arXiv Detail & Related papers (2024-12-22T17:17:28Z) - AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea [88.79769371584491]
We present AnyEdit, a comprehensive multi-modal instruction editing dataset.<n>We ensure the diversity and quality of the AnyEdit collection through three aspects: initial data diversity, adaptive editing process, and automated selection of editing results.<n>Experiments on three benchmark datasets show that AnyEdit consistently boosts the performance of diffusion-based editing models.
arXiv Detail & Related papers (2024-11-24T07:02:56Z) - SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional Image Editing [53.00272278754867]
SEED-Data-Edit is a hybrid dataset for instruction-guided image editing.
High-quality editing data produced by an automated pipeline.
Real-world scenario data collected from the internet.
High-precision multi-turn editing data annotated by humans.
arXiv Detail & Related papers (2024-05-07T04:55:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.