Fugu-MT 論文翻訳(概要): All-in-One Slider for Attribute Manipulation in Diffusion Models

論文の概要: All-in-One Slider for Attribute Manipulation in Diffusion Models

arxiv url: http://arxiv.org/abs/2508.19195v1
Date: Tue, 26 Aug 2025 16:56:30 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-27 17:42:38.923821
Title: All-in-One Slider for Attribute Manipulation in Diffusion Models
Title（参考訳）: 拡散モデルにおける属性操作のためのオールインワンスライダ
Authors: Weixin Ye, Hongguang Zhu, Wei Wang, Yahui Liu, Mengyu Wang,
Abstract要約: テキスト埋め込み空間をスパースで意味のある属性方向に分解する軽量モジュールであるAll-in-One Sliderを紹介する。学習した方向を再結合することで、All-in-One Sliderは目に見えない属性のゼロショット操作をサポートする。提案手法は,実画像の属性操作を行うために,インバージョンフレームワークと統合するために拡張することができる。
参考スコア（独自算出の注目度）: 13.362768653792097
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Text-to-image (T2I) diffusion models have made significant strides in generating high-quality images. However, progressively manipulating certain attributes of generated images to meet the desired user expectations remains challenging, particularly for content with rich details, such as human faces. Some studies have attempted to address this by training slider modules. However, they follow a One-for-One manner, where an independent slider is trained for each attribute, requiring additional training whenever a new attribute is introduced. This not only results in parameter redundancy accumulated by sliders but also restricts the flexibility of practical applications and the scalability of attribute manipulation. To address this issue, we introduce the All-in-One Slider, a lightweight module that decomposes the text embedding space into sparse, semantically meaningful attribute directions. Once trained, it functions as a general-purpose slider, enabling interpretable and fine-grained continuous control over various attributes. Moreover, by recombining the learned directions, the All-in-One Slider supports zero-shot manipulation of unseen attributes (e.g., races and celebrities) and the composition of multiple attributes. Extensive experiments demonstrate that our method enables accurate and scalable attribute manipulation, achieving notable improvements compared to previous methods. Furthermore, our method can be extended to integrate with the inversion framework to perform attribute manipulation on real images, broadening its applicability to various real-world scenarios. The code and trained model will be released at: https://github.com/ywxsuperstar/KSAE-FaceSteer.
Abstract（参考訳）: テキスト・ツー・イメージ(T2I)拡散モデルは高品質な画像の生成に大きく貢献している。しかし、特に人間の顔のようなリッチな細部を持つコンテンツでは、ユーザの期待に応えるために生成画像の特定の属性を段階的に操作することは困難である。いくつかの研究は、スライダモジュールのトレーニングによってこの問題に対処しようと試みている。しかし、それぞれの属性に対して独立したスライダがトレーニングされ、新しい属性が導入されるたびに追加のトレーニングが必要になる。これはスライダによって蓄積されるパラメータの冗長性だけでなく、実用アプリケーションの柔軟性や属性操作のスケーラビリティも制限する。この問題に対処するために,テキスト埋め込み空間をスパースで意味のある属性方向に分解する軽量モジュールであるAll-in-One Sliderを紹介した。トレーニングが完了すると、汎用スライダとして機能し、様々な属性を解釈可能できめ細かい連続的な制御を可能にする。さらに、学習した方向を再結合することにより、All-in-One Sliderは、目に見えない属性(例えば、レースやセレブ)のゼロショット操作と、複数の属性の合成をサポートする。大規模な実験により,提案手法は高精度でスケーラブルな属性操作が可能であり,従来の手法と比較して顕著な改善が得られた。さらに,本手法は,インバージョンフレームワークと統合して実画像の属性操作を行なえるように拡張することができ,様々な実世界のシナリオへの適用性を高めることができる。コードとトレーニングされたモデルは、https://github.com/ywxsuperstar/KSAE-FaceSteerでリリースされる。

論文の概要: All-in-One Slider for Attribute Manipulation in Diffusion Models

関連論文リスト