Fugu-MT 論文翻訳(概要): CoEditor++: Instruction-based Visual Editing via Cognitive Reasoning

論文の概要: CoEditor++: Instruction-based Visual Editing via Cognitive Reasoning

arxiv url: http://arxiv.org/abs/2603.05518v1
Date: Sat, 31 Jan 2026 12:20:46 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-15 16:38:22.368112
Title: CoEditor++: Instruction-based Visual Editing via Cognitive Reasoning
Title（参考訳）: CoEditor++: 認知推論によるインストラクションベースのビジュアル編集
Authors: Minheng Ni, Yutao Fan, Zhengyuan Yang, Yeli Shen, Yuxiang Wei, Yaowen Zhang, Lijuan Wang, Lei Zhang, Wangmeng Zuo,
Abstract要約: CoEditor++は、編集を"編集する方法"と"編集方法"に分解する、トレーニング不要のフレームワークである。我々は,CoEditor++が編集タスクと編集タスクの両方において,最先端のパフォーマンスを実現することを示す。以上の結果から,認知中心型画像編集の可能性が示唆された。
参考スコア（独自算出の注目度）: 98.98349220451216
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Recent advances in large multimodal models (LMMs) have enabled instruction-based image editing, allowing users to modify visual content via natural language descriptions. However, existing approaches often struggle with high-level semantic reasoning and visual consistency, particularly under ambiguous or complex instructions. To address these challenges, we propose CoEditor++, a cognitively structured, training-free framework that decomposes editing into "what to edit" and "how to edit" through two cognitive stages with a reflective self-selection mechanism, enabling robust, fine-grained, and interpretable editing. Built entirely from open-sourced components, CoEditor++ requires no additional training or fine-tuning, ensuring transparency and cross-domain applicability. We evaluate CoEditor++ on SmartEdit, a widely used benchmark for general editing, and AltBear, a privacy and compliance-oriented benchmark. Experimental results show that CoEditor++ achieves state-of-the-art performance in both general editing and responsible editing tasks compared with open-sourced models that require training on specialized editing datasets maintaining significantly higher visual consistency. When compared with closed-source models such as Nano Banana Pro or GPT-4o, CoEditor++ preserves comparable instruction following while still substantially outperforming them in visual consistency. Extensive ablation studies confirm that the effectiveness of CoEditor++ benefits from its structured cognitive design rather than any specific model component. Our findings suggest the potential toward cognitive-centric instruction-based image editing.
Abstract（参考訳）: 大規模マルチモーダルモデル(LMM)の最近の進歩により,命令ベースの画像編集が可能になった。しかし、既存のアプローチは、特に曖昧または複雑な指示の下で、高レベルのセマンティック推論と視覚的一貫性に苦しむことが多い。これらの課題に対処するために、CoEditor++を提案する。CoEditor++は、編集を2つの認知段階を通して「編集するもの」と「編集方法」に分解し、堅牢できめ細かな、解釈可能な編集を可能にする。 CoEditor++は、完全にオープンソースコンポーネントで構成されており、追加のトレーニングや微調整を必要とせず、透明性とドメイン間の適用性を保証する。 We evaluate CoEditor++ on SmartEdit, a wide used benchmark for general editing, and a AltBear, a privacy and compliance-oriented benchmark。実験の結果,CoEditor++は,高度な視覚的整合性を維持するための特別な編集データセットのトレーニングを必要とするオープンソースモデルと比較して,汎用的な編集タスクと責任のある編集タスクの両方において,最先端のパフォーマンスを実現していることがわかった。 Nano Banana Pro や GPT-4o のようなクローズドソースモデルと比較すると、CoEditor++ は同等の命令を保ちながら、視覚的一貫性では大幅に向上している。広範囲にわたるアブレーション研究により、CoEditor++の有効性は、特定のモデルコンポーネントではなく、構造化された認知設計から恩恵を受けていることが確認された。以上の結果から,認知中心型画像編集の可能性が示唆された。

論文の概要: CoEditor++: Instruction-based Visual Editing via Cognitive Reasoning

関連論文リスト