Fugu-MT 論文翻訳(概要): Omni IIE Bench: Benchmarking the Practical Capabilities of Image Editing Models

論文の概要: Omni IIE Bench: Benchmarking the Practical Capabilities of Image Editing Models

arxiv url: http://arxiv.org/abs/2603.16944v1
Date: Mon, 16 Mar 2026 08:07:06 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-19 18:32:57.286116
Title: Omni IIE Bench: Benchmarking the Practical Capabilities of Image Editing Models
Title（参考訳）: Omni IIE Bench: イメージ編集モデルの実用能力のベンチマーク
Authors: Yujia Yang, Yuanxiang Wang, Zhenyu Guan, Tiankun Yang, Chenxi Bao, Haopeng Jin, Jinwen Luo, Xinyu Zuo, Lisheng Duan, Haijin Liang, Jin Ma, Xinming Wang, Ruiwen Tao, Hongzhu Yi,
Abstract要約: Omni IIE Benchは、実用的なアプリケーションシナリオにおいて、IIEモデルの編集一貫性を診断するために設計されたベンチマークである。我々はOmni IIE Benchを用いた8つの主流IIEモデルの総合評価を行った。本分析は,低セマンティックスケールから高セマンティックスケールタスクへの移行時のパフォーマンスギャップを初めて定量化する。
参考スコア（独自算出の注目度）: 12.603176617170504
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While Instruction-based Image Editing (IIE) has achieved significant progress, existing benchmarks pursue task breadth via mixed evaluations. This paradigm obscures a critical failure mode crucial in professional applications: the inconsistent performance of models across tasks of varying semantic scales. To address this gap, we introduce Omni IIE Bench, a high-quality, human-annotated benchmark specifically designed to diagnose the editing consistency of IIE models in practical application scenarios. Omni IIE Bench features an innovative dual-track diagnostic design: (1) Single-turn Consistency, comprising shared-context task pairs of attribute modification and entity replacement; and (2) Multi-turn Coordination, involving continuous dialogue tasks that traverse semantic scales. The benchmark is constructed via an exceptionally rigorous multi-stage human filtering process, incorporating a quality standard enforced by computer vision graduate students and an industry relevance review conducted by professional designers. We perform a comprehensive evaluation of 8 mainstream IIE models using Omni IIE Bench. Our analysis quantifies, for the first time, a prevalent performance gap: nearly all models exhibit a significant performance degradation when transitioning from low-semantic-scale to high-semantic-scale tasks. Omni IIE Bench provides critical diagnostic tools and insights for the development of next-generation, more reliable, and stable IIE models.
Abstract（参考訳）: Instruction-based Image Editing (IIE)は大きな進歩を遂げているが、既存のベンチマークは混合評価によってタスク幅を追求している。このパラダイムは、さまざまなセマンティックスケールのタスク間でのモデルの一貫性のないパフォーマンスという、プロフェッショナルアプリケーションにおいて重要な障害モードを曖昧にします。 Omni IIE Benchは、実用的なアプリケーションシナリオにおけるIIEモデルの編集一貫性の診断に特化して設計された、高品質で人間による注釈付きベンチマークである。 Omni IIE Benchは,(1)属性修正とエンティティ置換の共有コンテキストタスクペアからなる単一ターン一貫性,(2)セマンティックスケールを横断する連続的な対話タスクを含むマルチターンコーディネーションという,革新的なデュアルトラック診断設計を特徴としている。このベンチマークは、コンピュータビジョンの大学院生が実施する品質基準と、プロのデザイナーが実施する業界関連レビューを取り入れた、非常に厳格な多段階人間のフィルタリングプロセスによって構築されている。我々はOmni IIE Benchを用いた8つの主流IIEモデルの総合評価を行った。ほぼすべてのモデルが,低セマンティックスケールから高セマンティックスケールタスクへの移行時に,顕著なパフォーマンス劣化を示す。 Omni IIE Benchは、次世代で信頼性が高く安定したIIEモデルの開発のための重要な診断ツールと洞察を提供する。

論文の概要: Omni IIE Bench: Benchmarking the Practical Capabilities of Image Editing Models

関連論文リスト