Fugu-MT 論文翻訳(概要): Uni-Edit: Intelligent Editing Is A General Task For Unified Model Tuning

論文の概要: Uni-Edit: Intelligent Editing Is A General Task For Unified Model Tuning

arxiv url: http://arxiv.org/abs/2605.21487v2
Date: Fri, 22 May 2026 09:11:59 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-25 14:44:53.759739
Title: Uni-Edit: Intelligent Editing Is A General Task For Unified Model Tuning
Title（参考訳）: Uni-Edit:Intelligent Editingは統一モデルチューニングのための一般的なタスク
Authors: Dian Zheng, Manyuan Zhang, Hongyu Li, Hongbo Liu, Kai Zou, Kaituo Feng, Hongsheng Li,
Abstract要約: 我々は,Uniified Multimodal Models チューニングの最初の汎用タスクとして機能する,インテリジェントな画像編集タスクである Uni-Edit を提案する。複雑な混合パイプラインとは異なり、Uni-Editは1つのタスク、1つのトレーニングステージ、1つのデータセットを使用して、3つの機能すべてのパフォーマンスを一度に改善する。我々は,Uni-Editのみをチューニングすることで,補助的な操作を伴わずに,3つの機能にまたがる包括的な拡張を実現することを示す。
参考スコア（独自算出の注目度）: 43.870883813242166
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Currently, enhancing Unified Multimodal Models (UMMs) with image understanding, generation, and editing capabilities mainly relies on mixed multi-task training. Due to inherent task conflicts, such strategy requires complex multi-stage pipelines, massive data mixing, and balancing tricks, merely resulting in a performance trade-off rather than true mutual reinforcement. To break this paradigm, we propose Uni-Edit, an intelligent image editing task that serves as the first general task for UMM tuning. Unlike complex mixed pipelines, Uni-Edit improves performance across all three abilities at once using only one task, one training stage, and one dataset. Specifically, we first identify image editing as an inherently ideal general task, as it naturally demands both visual understanding and generation. However, existing editing data relies on simplistic instructions that severely underutilize a model's understanding capacity. To address this, we introduce the first automated and scalable data synthesis pipeline for intelligent editing, transforming diverse VQA data into complex and effective editing instructions with embedded questions and nested logic. This yields Uni-Edit-148k, pairing diverse reasoning-intensive instructions with high-quality edited images. Extensive experiments on BAGEL and Janus-Pro demonstrate that tuning solely on Uni-Edit achieves comprehensive enhancements across all three capabilities without any auxiliary operations.
Abstract（参考訳）: 現在、イメージ理解、生成、編集機能を備えた統一マルチモーダルモデル(UMM)の強化は、主に混合マルチタスクトレーニングに依存している。本来的なタスクの競合のため、このような戦略には複雑なマルチステージパイプライン、大規模なデータミキシング、バランシングのトリックが必要であり、真の相互強化ではなく、単にパフォーマンス上のトレードオフをもたらすだけである。このパラダイムを破るために、UMMチューニングのための最初の一般的なタスクとして機能する、インテリジェントな画像編集タスクであるUni-Editを提案する。複雑な混合パイプラインとは異なり、Uni-Editは1つのタスク、1つのトレーニングステージ、1つのデータセットを使用して、3つの機能すべてのパフォーマンスを一度に改善する。具体的には、視覚的理解と生成の両方を自然に要求するので、まず画像編集を本質的に理想的な汎用タスクとして認識する。しかし、既存の編集データは、モデルの理解能力を著しく過小評価する単純化的な命令に依存している。これを解決するために、我々は、インテリジェントな編集のための最初の自動化されたスケーラブルなデータ合成パイプラインを導入し、様々なVQAデータを組込み質問やネストしたロジックで複雑な効率的な編集命令に変換する。これによりUni-Edit-148kが得られ、様々な推論集約的な命令と高品質な編集画像とをペアリングする。 BAGELとJanus-Proの大規模な実験は、Uni-Editのみのチューニングが補助操作なしで3つの機能全体にわたって包括的な拡張を実現することを示した。

論文の概要: Uni-Edit: Intelligent Editing Is A General Task For Unified Model Tuning

関連論文リスト