Fugu-MT 論文翻訳(概要): ScaleEdit-12M: Scaling Open-Source Image Editing Data Generation via Multi-Agent Framework

論文の概要: ScaleEdit-12M: Scaling Open-Source Image Editing Data Generation via Multi-Agent Framework

arxiv url: http://arxiv.org/abs/2603.20644v1
Date: Sat, 21 Mar 2026 04:39:19 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-24 19:11:39.013085
Title: ScaleEdit-12M: Scaling Open-Source Image Editing Data Generation via Multi-Agent Framework
Title（参考訳）: ScaleEdit-12M: マルチエージェントフレームワークによるオープンソースの画像編集データのスケーリング
Authors: Guanzhou Chen, Erfei Cui, Changyao Tian, Danni Yang, Ganlin Yang, Yu Qiao, Hongsheng Li, Gen Luo, Hongjie Zhang,
Abstract要約: 大規模で高品質な画像編集データセットをエンドツーエンドに構築するための階層的マルチエージェントフレームワークであるScaleEditorを提案する。パイプラインは3つの重要なコンポーネントから構成される:ワールド知識注入によるソース画像拡張、適応型マルチエージェント編集命令画像合成、タスク認識データ品質検証機構である。 ScaleEditorを使って、これまでで最大のオープンソースの画像編集データセットであるScaleEdit-12Mをキュレートします。
参考スコア（独自算出の注目度）: 58.443783258153786
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Instruction-based image editing has emerged as a key capability for unified multimodal models (UMMs), yet constructing large-scale, diverse, and high-quality editing datasets without costly proprietary APIs remains challenging. Previous image editing datasets either rely on closed-source models for annotation, which prevents cost-effective scaling, or employ fixed synthetic editing pipelines, which suffer from limited quality and generalizability. To address these challenges, we propose ScaleEditor, a fully open-source hierarchical multi-agent framework for end-to-end construction of large-scale, high-quality image editing datasets. Our pipeline consists of three key components: source image expansion with world-knowledge infusion, adaptive multi-agent editing instruction-image synthesis, and a task-aware data quality verification mechanism. Using ScaleEditor, we curate ScaleEdit-12M, the largest open-source image editing dataset to date, spanning 23 task families across diverse real and synthetic domains. Fine-tuning UniWorld-V1 and Bagel on ScaleEdit yields consistent gains, improving performance by up to 10.4% on ImgEdit and 35.1% on GEdit for general editing benchmarks and by up to 150.0% on RISE and 26.5% on KRIS-Bench for knowledge-infused benchmarks. These results demonstrate that open-source, agentic pipelines can approach commercial-grade data quality while retaining cost-effectiveness and scalability. Both the framework and dataset will be open-sourced.
Abstract（参考訳）: 命令ベースの画像編集は、統一マルチモーダルモデル(UMM)の重要な機能として登場したが、高価なプロプライエタリなAPIを使わずに、大規模で多様な高品質な編集データセットを構築することは、依然として困難である。以前の画像編集データセットはアノテーションのクローズドソースモデルに依存しており、コスト効率のよいスケーリングを防いでいる。これらの課題に対処するために,大規模で高品質な画像編集データセットをエンドツーエンドに構築するための,オープンソースの階層型マルチエージェントフレームワークであるScaleEditorを提案する。パイプラインは3つの重要なコンポーネントから構成される:ワールド知識注入によるソース画像拡張、適応型マルチエージェント編集命令画像合成、タスク認識データ品質検証機構である。 ScaleEditorを使って、これまでで最大のオープンソースの画像編集データセットであるScaleEdit-12Mをキュレートします。微調整のUniWorld-V1とBagel on ScaleEditは、ImgEditで10.4%、GEditで35.1%、RISEで150.0%、KRIS-Benchで26.5%向上した。これらの結果は、オープンソースのエージェントパイプラインが、コスト効率とスケーラビリティを維持しながら、商用レベルのデータ品質にアプローチ可能であることを示している。フレームワークとデータセットはいずれもオープンソースになる予定だ。

論文の概要: ScaleEdit-12M: Scaling Open-Source Image Editing Data Generation via Multi-Agent Framework

関連論文リスト