Fugu-MT 論文翻訳(概要): Personalized Face Inpainting with Diffusion Models by Parallel Visual Attention

論文の概要: Personalized Face Inpainting with Diffusion Models by Parallel Visual Attention

arxiv url: http://arxiv.org/abs/2312.03556v1
Date: Wed, 6 Dec 2023 15:39:03 GMT
ステータス: 翻訳完了
システム内更新日: 2023-12-07 14:38:01.519059
Title: Personalized Face Inpainting with Diffusion Models by Parallel Visual Attention
Title（参考訳）: 並列視覚注意による拡散モデルによる顔のパーソナライズ
Authors: Jianjin Xu, Saman Motamed, Praneetha Vaddamanu, Chen Henry Wu, Christian Haene, Jean-Charles Bazin, Fernando de la Torre
Abstract要約: 本稿では,パラレル視覚注意(PVA, Parallel Visual Attention, PVA)と拡散モデルとの併用による塗装結果の改善を提案する。我々はCelebAHQ-IDIで注目モジュールとIDエンコーダをトレーニングする。実験により, PVAは顔の塗り絵と顔の塗り絵の両面において, 言語指導タスクと相容れない同一性を持つことが示された。
参考スコア（独自算出の注目度）: 55.33017432880408
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Face inpainting is important in various applications, such as photo restoration, image editing, and virtual reality. Despite the significant advances in face generative models, ensuring that a person's unique facial identity is maintained during the inpainting process is still an elusive goal. Current state-of-the-art techniques, exemplified by MyStyle, necessitate resource-intensive fine-tuning and a substantial number of images for each new identity. Furthermore, existing methods often fall short in accommodating user-specified semantic attributes, such as beard or expression. To improve inpainting results, and reduce the computational complexity during inference, this paper proposes the use of Parallel Visual Attention (PVA) in conjunction with diffusion models. Specifically, we insert parallel attention matrices to each cross-attention module in the denoising network, which attends to features extracted from reference images by an identity encoder. We train the added attention modules and identity encoder on CelebAHQ-IDI, a dataset proposed for identity-preserving face inpainting. Experiments demonstrate that PVA attains unparalleled identity resemblance in both face inpainting and face inpainting with language guidance tasks, in comparison to various benchmarks, including MyStyle, Paint by Example, and Custom Diffusion. Our findings reveal that PVA ensures good identity preservation while offering effective language-controllability. Additionally, in contrast to Custom Diffusion, PVA requires just 40 fine-tuning steps for each new identity, which translates to a significant speed increase of over 20 times.
Abstract（参考訳）: 顔のインペインティングは、写真復元、画像編集、仮想現実など、さまざまなアプリケーションで重要である。顔生成モデルが大幅に進歩したにもかかわらず、塗布プロセス中に人の独特の顔のアイデンティティが維持されることは、いまだ明白な目標である。 MyStyleによって実証された現在の最先端技術は、リソース集約的な微調整と、新しいアイデンティティごとにかなりの数の画像を必要とする。さらに、既存のメソッドは、ひげや表現などのユーザ固有のセマンティック属性の調整に不足することが多い。そこで本研究では, パラレル視覚意図(PVA)を拡散モデルと組み合わせることで, 塗装結果の改善と, 推論中の計算複雑性の低減を図る。具体的には、識別エンコーダにより参照画像から抽出された特徴に付随する認知ネットワーク内の各クロスアテンションモジュールに並列アテンション行列を挿入する。我々はCelebAHQ-IDIで注目モジュールとIDエンコーダをトレーニングする。実験により、PVAは、MyStyle、Paint by Example、Custom Diffusionなど、さまざまなベンチマークと比較して、顔の塗り絵と顔の塗り絵と言語指導タスクの両方で非並列のアイデンティティ類似性が得られることが示された。以上の結果から,PVAは効果的な言語制御性を提供しながら,良好なアイデンティティ保護を実現していることが明らかとなった。さらに、Custom Diffusionとは対照的に、新しいアイデンティティごとに40ステップの微調整が必要であり、これは20倍以上の大幅なスピードアップを意味する。

論文の概要: Personalized Face Inpainting with Diffusion Models by Parallel Visual Attention

関連論文リスト