Fugu-MT 論文翻訳(概要): ImPoster: Text and Frequency Guidance for Subject Driven Action Personalization using Diffusion Models

論文の概要: ImPoster: Text and Frequency Guidance for Subject Driven Action Personalization using Diffusion Models

arxiv url: http://arxiv.org/abs/2409.15650v1
Date: Tue, 24 Sep 2024 01:25:19 GMT
ステータス: 翻訳完了
システム内更新日: 2024-09-26 11:32:56.000646
Title: ImPoster: Text and Frequency Guidance for Subject Driven Action Personalization using Diffusion Models
Title（参考訳）: ImPoster:拡散モデルを用いた主観的行動パーソナライゼーションのためのテキストと周波数誘導
Authors: Divya Kothandaraman, Kuldeep Kulkarni, Sumit Shekhar, Balaji Vasan Srinivasan, Dinesh Manocha,
Abstract要約: 提案するImPosterは,「運転」動作を行う「ソース」対象のターゲット画像を生成する新しいアルゴリズムである。私たちのアプローチは完全に教師なしで、キーポイントやポーズといった追加のアノテーションへのアクセスは不要です。
参考スコア（独自算出の注目度）: 55.43801602995778
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present ImPoster, a novel algorithm for generating a target image of a 'source' subject performing a 'driving' action. The inputs to our algorithm are a single pair of a source image with the subject that we wish to edit and a driving image with a subject of an arbitrary class performing the driving action, along with the text descriptions of the two images. Our approach is completely unsupervised and does not require any access to additional annotations like keypoints or pose. Our approach builds on a pretrained text-to-image latent diffusion model and learns the characteristics of the source and the driving image by finetuning the diffusion model for a small number of iterations. At inference time, ImPoster performs step-wise text prompting i.e. it denoises by first moving in the direction of the image manifold corresponding to the driving image followed by the direction of the image manifold corresponding to the text description of the desired target image. We propose a novel diffusion guidance formulation, image frequency guidance, to steer the generation towards the manifold of the source subject and the driving action at every step of the inference denoising. Our frequency guidance formulations are derived from the frequency domain properties of images. We extensively evaluate ImPoster on a diverse set of source-driving image pairs to demonstrate improvements over baselines. To the best of our knowledge, ImPoster is the first approach towards achieving both subject-driven as well as action-driven image personalization. Code and data is available at https://github.com/divyakraman/ImPosterDiffusion2024.
Abstract（参考訳）: 提案するImPosterは,「運転」動作を行う「ソース」対象のターゲット画像を生成する新しいアルゴリズムである。アルゴリズムへの入力は、編集したい対象のソースイメージの1対と、運転動作を行う任意のクラスの対象のドライブイメージと、その2つのイメージのテキスト記述である。私たちのアプローチは完全に教師なしで、キーポイントやポーズといった追加のアノテーションへのアクセスは不要です。提案手法は,事前学習したテキストから画像への潜伏拡散モデルに基づいて,少数の反復に対して拡散モデルを微調整することにより,ソースと駆動画像の特徴を学習する。推論時に、ImPosterは、第1に駆動画像に対応する画像多様体の方向に移動し、次いで所望の目標画像のテキスト記述に対応する画像多様体の方向を指示するステップワイズテキストプロンプトを行う。提案手法は, 音源の多様体に向けて発生を制御し, 推論の全てのステップで駆動動作を行うための, 拡散誘導定式化, 画像周波数誘導を提案する。周波数誘導の定式化は画像の周波数領域特性から導かれる。我々はImPosterを多種多様なソース駆動イメージペアで広範囲に評価し、ベースラインの改善を実証した。私たちの知る限りでは、ImPosterは主観的および行動的イメージパーソナライゼーションの両方を達成するための最初のアプローチです。コードとデータはhttps://github.com/divyakraman/ImPosterDiffusion2024で公開されている。

論文の概要: ImPoster: Text and Frequency Guidance for Subject Driven Action Personalization using Diffusion Models

関連論文リスト