Fugu-MT 論文翻訳(概要): DiffUHaul: A Training-Free Method for Object Dragging in Images

論文の概要: DiffUHaul: A Training-Free Method for Object Dragging in Images

arxiv url: http://arxiv.org/abs/2406.01594v1
Date: Mon, 3 Jun 2024 17:59:53 GMT
ステータス: 翻訳完了
システム内更新日: 2024-06-05 21:41:25.355175
Title: DiffUHaul: A Training-Free Method for Object Dragging in Images
Title（参考訳）: DiffUHaul: 画像にオブジェクトをドラッグする訓練不要の方法
Authors: Omri Avrahami, Rinon Gal, Gal Chechik, Ohad Fried, Dani Lischinski, Arash Vahdat, Weili Nie,
Abstract要約: DiffUHaulと呼ばれるオブジェクトドラッグタスクのためのトレーニング不要な手法を提案する。まず、各認知段階に注意マスキングを適用して、各生成を異なるオブジェクトにまたがってよりゆがみやすくする。初期のデノナイジングステップでは、ソース画像とターゲット画像の注意特徴を補間して、新しいレイアウトを元の外観とスムーズに融合させる。
参考スコア（独自算出の注目度）: 78.93531472479202
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Text-to-image diffusion models have proven effective for solving many image editing tasks. However, the seemingly straightforward task of seamlessly relocating objects within a scene remains surprisingly challenging. Existing methods addressing this problem often struggle to function reliably in real-world scenarios due to lacking spatial reasoning. In this work, we propose a training-free method, dubbed DiffUHaul, that harnesses the spatial understanding of a localized text-to-image model, for the object dragging task. Blindly manipulating layout inputs of the localized model tends to cause low editing performance due to the intrinsic entanglement of object representation in the model. To this end, we first apply attention masking in each denoising step to make the generation more disentangled across different objects and adopt the self-attention sharing mechanism to preserve the high-level object appearance. Furthermore, we propose a new diffusion anchoring technique: in the early denoising steps, we interpolate the attention features between source and target images to smoothly fuse new layouts with the original appearance; in the later denoising steps, we pass the localized features from the source images to the interpolated images to retain fine-grained object details. To adapt DiffUHaul to real-image editing, we apply a DDPM self-attention bucketing that can better reconstruct real images with the localized model. Finally, we introduce an automated evaluation pipeline for this task and showcase the efficacy of our method. Our results are reinforced through a user preference study.
Abstract（参考訳）: テキストから画像への拡散モデルは多くの画像編集タスクを解くのに有効であることが証明されている。しかし、シーン内のオブジェクトをシームレスに移動させるという一見単純な作業は、驚くほど難しいままだ。この問題に対処する既存の手法は、空間的推論が欠如しているために、現実のシナリオで確実に機能するのに苦労することが多い。本研究では,DiffUHaulと呼ばれるオブジェクトドラッグングタスクに対して,局所的なテキスト・画像モデルの空間的理解を活用する学習自由度手法を提案する。局所モデルのレイアウト入力を盲目的に操作すると、モデル内のオブジェクト表現の内在的絡み合いにより、編集性能が低下する傾向にある。この目的のために,まず注目マスキングを各デノナイズステップに適用し,各生成物を異なるオブジェクトに分散させ,高レベルのオブジェクトの外観を維持するために自己認識共有機構を採用する。さらに,新しい拡散アンカリング手法を提案する。初期の段階では,ソース画像とターゲット画像の注意特徴を補間して,元の外観とスムーズに新しいレイアウトを融合させ,後段では,ソース画像から補間された画像に局所的特徴を渡すことで,細かなオブジェクトの詳細を保持する。 DiffUHaul を実画像編集に適用するために,DiffUHaul に DDPM 自己注意バケットを適用する。最後に,本課題に対する自動評価パイプラインを導入し,本手法の有効性を示す。私たちの結果は、ユーザの好み調査によって強化されています。

論文の概要: DiffUHaul: A Training-Free Method for Object Dragging in Images

関連論文リスト