Fugu-MT 論文翻訳(概要): TECCI: Tricky Edits of Collected and Curated Images

論文の概要: TECCI: Tricky Edits of Collected and Curated Images

arxiv url: http://arxiv.org/abs/2606.01213v1
Date: Sun, 31 May 2026 13:03:52 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-02 21:34:29.388299
Title: TECCI: Tricky Edits of Collected and Curated Images
Title（参考訳）: TECCI: 収集された画像とキュレーションされた画像のトリッキーな編集
Authors: Aishwarya Agrawal, Roy Hirsch, Yasumasa Onoe, Sherry Ben, Jason Baldridge,
Abstract要約: TECCIは、私たちがリリースしている画像の全く新しいセットで構成されています。これらの画像とカテゴリは、既存の手法の弱点をターゲットとして意図的にキュレートされました。 TECCIの編集命令は、Geminiによって自動生成され、ソース画像ごとに5種類の編集をカバーします。我々は,TECCI上での5つの主要な画像編集モデルの人間による評価を行う。人間は,1)指示に従う,2)編集の最小化,3)視覚的品質の3つの次元に沿って出力を判断する。
参考スコア（独自算出の注目度）: 18.15891884619145
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Despite tremendous recent progress, current text-guided image editing methods still struggle with many aspects of editing involving instruction following, minimally editing the source image, and ensuring high visual quality. These problems are especially apparent when the requested edit is challenging, such as those that involve position, motion, viewpoint, scale and creative edits. To systematically test generative image editors, we propose a novel image editing benchmark -- TECCI: Tricky Edits of Collected and Curated Images. TECCI consists of a completely new set of images we are releasing. The images in TECCI span 7 image categories. The images and these categories were curated intentionally to target weaknesses of existing methods. The edit instructions in TECCI are automatically generated by Gemini, covering 5 edit types per source image. We also curated a set of 530 images for which we created challenging manually written edit instructions. Overall, TECCI contains 7550 pairs of images and edit instructions. We conduct human evaluations of five leading image editing models on TECCI. Humans judge outputs along three dimensions: 1) instruction following, 2) minimality of the edits, and 3) visual quality. To scale-up the evaluation, we also build an auto-rater using Gemini that achieves 74.7% accuracy in matching human evaluations. Our evaluations reveal that: 1) none of the models exceed a 22% overall success rate, demonstrating the challenging nature of TECCI, 2) Nano Banana Pro is the best performing model overall, 3) models perform significantly better at instruction following compared to minimal edits and visual quality, 4) models struggle with editing architecture and nature images which require strong understanding of spatial layout and intricate visual details. 5) reasoning and creative edits are the most difficult, whereas color and appearance edits are the easiest.
Abstract（参考訳）: 近年の進歩にもかかわらず、現在のテキスト誘導画像編集手法は、命令に従うこと、ソース画像の編集を最小限にし、高い視覚的品質を確保することを含む多くの側面に苦慮している。これらの問題は、位置、動き、視点、スケール、創造的な編集など、要求された編集が困難である場合に特に顕著である。生成画像エディタを体系的にテストするために,新しい画像編集ベンチマークTECCI: Tricky Edits of Collected and Curated Imagesを提案する。 TECCIは、私たちがリリースしている全く新しいイメージセットで構成されています。 TECCIの画像は7つのカテゴリにまたがっている。画像とそれらのカテゴリは、既存の手法の弱点を狙うために意図的にキュレートされた。 TECCIの編集命令はGeminiによって自動的に生成され、ソースイメージ毎に5つの編集タイプをカバーする。また、530枚の画像を編集し、手書きによる手書き編集命令を作成しました。全体として、TECCIには7550対の画像と編集命令が含まれている。我々は、TECCI上で5つの主要な画像編集モデルの人間による評価を行う。人間は3次元で出力を判断する。 1) 指示 2)編集の最小限,及び 3)視覚的品質。また,評価のスケールアップのために,人的評価のマッチングにおいて74.7%の精度を実現するGeminiを用いたオートレータを構築した。私たちの評価は、こう示しています。 1) いずれのモデルも総合的な成功率は22%を超えず, TECCIの挑戦的な性質を示した。 2)Nano Banana Proは全体として最高のパフォーマンスモデルです。 3)モデルでは、最小限の編集や視覚的品質に比べて、命令の順応性が大幅に向上する。 4) 空間的レイアウトの理解と複雑な視覚的詳細を必要とするアーキテクチャや自然画像の編集に苦慮するモデル。 5) 推論と創造的な編集が最も難しいのに対して、色と外観の編集は最も簡単です。

論文の概要: TECCI: Tricky Edits of Collected and Curated Images

関連論文リスト