Fugu-MT 論文翻訳(概要): LatRef-Diff: Latent and Reference-Guided Diffusion for Facial Attribute Editing and Style Manipulation

論文の概要: LatRef-Diff: Latent and Reference-Guided Diffusion for Facial Attribute Editing and Style Manipulation

arxiv url: http://arxiv.org/abs/2604.21279v1
Date: Thu, 23 Apr 2026 04:47:40 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-24 14:40:06.308324
Title: LatRef-Diff: Latent and Reference-Guided Diffusion for Facial Attribute Editing and Style Manipulation
Title（参考訳）: LatRef-Diff: 顔属性編集とスタイル操作のための潜時および基準誘導拡散
Authors: Wenmin Huang, Weiqi Luo, Xiaochun Cao, Jiwu Huang,
Abstract要約: 条件付きGANは進歩しているものの、精度の問題やトレーニング不安定性によって制限されている。これらの制約に対処する新しい拡散ベースのフレームワークであるLatRef-Diffを提案する。本稿では,LatRef-Diffが定性評価と定量的評価の両方において最先端の性能を達成することを示す。
参考スコア（独自算出の注目度）: 78.6161238980415
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Facial attribute editing and style manipulation are crucial for applications like virtual avatars and photo editing. However, achieving precise control over facial attributes without altering unrelated features is challenging due to the complexity of facial structures and the strong correlations between attributes. While conditional GANs have shown progress, they are limited by accuracy issues and training instability. Diffusion models, though promising, face challenges in style manipulation due to the limited expressiveness of semantic directions. In this paper, we propose LatRef-Diff, a novel diffusion-based framework that addresses these limitations. We replace the traditional semantic directions in diffusion models with style codes and propose two methods for generating them: latent and reference guidance. Based on these style codes, we design a style modulation module that integrates them into the target image, enabling both random and customized style manipulation. This module incorporates learnable vectors, cross-attention mechanisms, and a hierarchical design to improve accuracy and image quality. Additionally, to enhance training stability while eliminating the need for paired images (e.g., before and after editing), we propose a forward-backward consistency training strategy. This strategy first removes the target attribute approximately using image-specific semantic directions and then restores it via style modulation, guided by perceptual and classification losses. Extensive experiments on CelebA-HQ demonstrate that LatRef-Diff achieves state-of-the-art performance in both qualitative and quantitative evaluations. Ablation studies validate the effectiveness of our model's design choices.
Abstract（参考訳）: 仮想アバターや写真編集のようなアプリケーションには、顔属性の編集とスタイルの操作が不可欠である。しかし, 顔構造が複雑であり, 属性間の相関が強いため, 無関係な特徴を変化させることなく, 顔属性を正確に制御することは困難である。条件付きGANは進歩しているものの、精度の問題やトレーニング不安定性によって制限されている。拡散モデルは有望ではあるが、セマンティックな方向の表現力に制限があるため、スタイル操作の課題に直面している。本稿では,これらの制約に対処する新しい拡散ベースのフレームワークであるLatRef-Diffを提案する。拡散モデルにおける従来の意味的方向をスタイルコードに置き換え、それらを生成する2つの方法、潜時と参照誘導を提案する。これらのスタイルコードに基づいて、ターゲット画像にそれらを統合したスタイル変調モジュールを設計し、ランダムかつカスタマイズされたスタイル操作を可能にする。このモジュールには学習可能なベクトル、クロスアテンション機構、階層設計が含まれており、精度と画質を向上させる。さらに、ペア画像(例えば、編集前後)の必要性を排除しつつ、トレーニング安定性を向上させるため、前方後方整合性トレーニング戦略を提案する。この戦略はまず、画像固有のセマンティックな方向を略してターゲット属性を除去し、その後、知覚的および分類的損失によって導かれるスタイル変調によってそれを復元する。 CelebA-HQでの大規模な実験により、LatRef-Diffは定性評価と定量的評価の両方で最先端の性能を達成することが示された。アブレーション研究は、我々のモデルの設計選択の有効性を検証する。

論文の概要: LatRef-Diff: Latent and Reference-Guided Diffusion for Facial Attribute Editing and Style Manipulation

関連論文リスト