Fugu-MT 論文翻訳(概要): CannyEdit: Selective Canny Control and Dual-Prompt Guidance for Training-Free Image Editing

論文の概要: CannyEdit: Selective Canny Control and Dual-Prompt Guidance for Training-Free Image Editing

arxiv url: http://arxiv.org/abs/2508.06937v2
Date: Sun, 26 Oct 2025 07:19:20 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-28 17:41:21.74192
Title: CannyEdit: Selective Canny Control and Dual-Prompt Guidance for Training-Free Image Editing
Title（参考訳）: CannyEdit: 学習不要の画像編集のための選択型キャニーコントロールとデュアルプロンプトガイダンス
Authors: Weiyan Xie, Han Gao, Didan Deng, Kaican Li, April Hua Liu, Yongxiang Huang, Nevin L. Zhang,
Abstract要約: CannyEditは、地域画像編集のための新しいトレーニング不要のフレームワークである。 Canny ControlNetから未編集領域のみに構造的なガイダンスを適用し、元のイメージの詳細を保存する。 CannyEditは例外的な柔軟性を提供しており、粗いマスクや、追加タスクのシングルポイントヒントで効果的に動作する。
参考スコア（独自算出の注目度）: 10.535939265557895
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Recent advances in text-to-image (T2I) models have enabled training-free regional image editing by leveraging the generative priors of foundation models. However, existing methods struggle to balance text adherence in edited regions, context fidelity in unedited areas, and seamless integration of edits. We introduce CannyEdit, a novel training-free framework that addresses this trilemma through two key innovations. First, Selective Canny Control applies structural guidance from a Canny ControlNet only to the unedited regions, preserving the original image's details while allowing for precise, text-driven changes in the specified editable area. Second, Dual-Prompt Guidance utilizes both a local prompt for the specific edit and a global prompt for overall scene coherence. Through this synergistic approach, these components enable controllable local editing for object addition, replacement, and removal, achieving a superior trade-off among text adherence, context fidelity, and editing seamlessness compared to current region-based methods. Beyond this, CannyEdit offers exceptional flexibility: it operates effectively with rough masks or even single-point hints in addition tasks. Furthermore, the framework can seamlessly integrate with vision-language models in a training-free manner for complex instruction-based editing that requires planning and reasoning. Our extensive evaluations demonstrate CannyEdit's strong performance against leading instruction-based editors in complex object addition scenarios.
Abstract（参考訳）: テキスト・ツー・イメージ(T2I)モデルの最近の進歩は,基礎モデルの創成的先行を生かして,訓練不要な地域画像編集を可能にしている。しかし、既存の手法では、編集領域におけるテキストの付着性のバランス、未編集領域におけるコンテキストの忠実さ、編集のシームレスな統合に苦慮している。 CannyEditは2つの重要なイノベーションを通じて、このトリレンマに対処する、新しいトレーニング不要のフレームワークです。まず、Selective Canny Controlは、Canny ControlNetからの構造化ガイダンスを未編集領域のみに適用し、元の画像の詳細を保存し、指定された編集可能な領域で正確なテキスト駆動の変更を可能にする。第2に、Dual-Prompt Guidanceは、特定の編集のためのローカルプロンプトと、全体のシーンコヒーレンスのためのグローバルプロンプトの両方を利用する。この相乗的アプローチにより、これらのコンポーネントは、オブジェクトの追加、置換、削除のための制御可能なローカル編集を可能にし、テキストの付着性、コンテキストの忠実性、そして現在のリージョンベースの方法と比較してシームレスに編集できる。さらにCannyEditは、粗いマスクやシングルポイントのヒントを付加して効果的に動作するという、非常に柔軟な機能を提供している。さらに、このフレームワークは、計画と推論を必要とする複雑な命令ベースの編集のためのトレーニング不要な方法で、視覚言語モデルとシームレスに統合することができる。我々の広範な評価は、複雑なオブジェクトの追加シナリオにおいて、指導ベースの主要なエディタに対して、CannyEditの強力なパフォーマンスを示している。

論文の概要: CannyEdit: Selective Canny Control and Dual-Prompt Guidance for Training-Free Image Editing

関連論文リスト