Fugu-MT 論文翻訳(概要): Diffusion-Based Makeup Transfer with Facial Region-Aware Makeup Features

論文の概要: Diffusion-Based Makeup Transfer with Facial Region-Aware Makeup Features

arxiv url: http://arxiv.org/abs/2603.20012v2
Date: Wed, 25 Mar 2026 11:47:44 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-26 14:25:25.88328
Title: Diffusion-Based Makeup Transfer with Facial Region-Aware Makeup Features
Title（参考訳）: 顔面領域を考慮した拡散型メイクアップ伝達
Authors: Zheng Gao, Debin Meng, Yunqi Miao, Zhensong Zhang, Songcen Xu, Ioannis Patras, Jifei Song,
Abstract要約: 顔面領域対応メイクアップ機能(FRAM)は,(1)メイクアップCLIPファインタニング,(2)アイデンティティと顔領域対応メイクアップの2段階からなる。具体的には、学習可能なトークンを使用してメイクアップCLIPエンコーダを問い合わせ、メイクアップインジェクションのための顔領域対応メイクアップ特徴を抽出する。実験結果から, 地域制御性およびメークアップ転送性能の優位性について検証した。
参考スコア（独自算出の注目度）: 42.68385892478631
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Current diffusion-based makeup transfer methods commonly use the makeup information encoded by off-the-shelf foundation models (e.g., CLIP) as condition to preserve the makeup style of reference image in the generation. Although effective, these works mainly have two limitations: (1) foundation models pre-trained for generic tasks struggle to capture makeup styles; (2) the makeup features of reference image are injected to the diffusion denoising model as a whole for global makeup transfer, overlooking the facial region-aware makeup features (i.e., eyes, mouth, etc) and limiting the regional controllability for region-specific makeup transfer. To address these, in this work, we propose Facial Region-Aware Makeup features (FRAM), which has two stages: (1) makeup CLIP fine-tuning; (2) identity and facial region-aware makeup injection. For makeup CLIP fine-tuning, unlike prior works using off-the-shelf CLIP, we synthesize annotated makeup style data using GPT-o3 and text-driven image editing model, and then use the data to train a makeup CLIP encoder through self-supervised and image-text contrastive learning. For identity and facial region-aware makeup injection, we construct before-and-after makeup image pairs from the edited images in stage 1 and then use them to learn to inject identity of source image and makeup of reference image to the diffusion denoising model for makeup transfer. Specifically, we use learnable tokens to query the makeup CLIP encoder to extract facial region-aware makeup features for makeup injection, which is learned via an attention loss to enable regional control. As for identity injection, we use a ControlNet Union to encode source image and its 3D mesh simultaneously. The experimental results verify the superiority of our regional controllability and our makeup transfer performance. Code is available at https://github.com/zaczgao/Facial_Region-Aware_Makeup.
Abstract（参考訳）: 現在の拡散型メークアップ転送法では, 既成の基盤モデル(例えばCLIP)で符号化されたメイクアップ情報を, 生成時の参照画像のメイクスタイルを維持する条件として用いることが一般的である。効果はあるものの,(1) 一般的な作業のために事前訓練された基礎モデルでは, 化粧スタイルの把握が困難であり, (2) 参照画像の化粧特徴は, グローバルメイク転送の拡散デノナイジングモデル全体に注入され, 顔領域認識化粧特徴(目, 口など)を見渡すとともに, 地域別メイク転送の地域制御性を制限する。そこで本研究では,(1)メイクCLIPファインタニング,(2)アイデンティティと顔領域認識メイクインジェクションの2段階からなる顔面領域認識メイクアップ機能(FRAM)を提案する。メイクアップCLIPの微調整については、既成のCLIPとは異なり、GPT-o3とテキスト駆動画像編集モデルを用いて注釈付きメイクスタイルデータを合成し、そのデータを用いて、セルフ教師付きおよび画像テキストコントラスト学習を通じてメイクアップCLIPエンコーダを訓練する。顔領域を意識したメイクアップインジェクションでは,ステージ1の編集画像から前後のメイクイメージペアを構築し,それを用いてソース画像の同一性や参照画像のメイクを,メイク転送のための拡散認知モデルに注入する。具体的には、学習可能なトークンを用いてメイクアップCLIPエンコーダを問合せし、メイクアップインジェクションのための顔領域対応メイクアップ特徴を抽出する。アイデンティティ注入に関しては、ソースイメージとその3Dメッシュを同時にエンコードするために、ControlNet Unionを使用します。実験結果から, 地域制御性およびメークアップ転送性能の優位性について検証した。コードはhttps://github.com/zaczgao/Facial_Region-Aware_Makeupで公開されている。

論文の概要: Diffusion-Based Makeup Transfer with Facial Region-Aware Makeup Features

関連論文リスト