Fugu-MT 論文翻訳(概要): LightFair: Towards an Efficient Alternative for Fair T2I Diffusion via Debiasing Pre-trained Text Encoders

論文の概要: LightFair: Towards an Efficient Alternative for Fair T2I Diffusion via Debiasing Pre-trained Text Encoders

arxiv url: http://arxiv.org/abs/2509.23639v1
Date: Sun, 28 Sep 2025 04:46:39 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-30 22:32:19.343441
Title: LightFair: Towards an Efficient Alternative for Fair T2I Diffusion via Debiasing Pre-trained Text Encoders
Title（参考訳）: LightFair: トレーニング済みテキストエンコーダの劣化による公正なT2I拡散の効率的な代替手段
Authors: Boyu Han, Qianqian Xu, Shilong Bao, Zhiyong Yang, Kangli Zi, Qingming Huang,
Abstract要約: 本稿では,テキストエンコーダの悪影響に対処して,公平なテキスト・画像拡散モデル(T2I DM)を実現するための,新しい軽量なアプローチを提案する。 T2I DMは複数のコンポーネントから構成されており、テキストエンコーダは最も微調整可能なフロントエンドモジュールである。本手法は,SOTA脱バイアスをトレーニング負荷のわずか1/4ドルで達成し,サンプリング負荷がほぼ増加しない。
参考スコア（独自算出の注目度）: 84.39846443122853
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper explores a novel lightweight approach LightFair to achieve fair text-to-image diffusion models (T2I DMs) by addressing the adverse effects of the text encoder. Most existing methods either couple different parts of the diffusion model for full-parameter training or rely on auxiliary networks for correction. They incur heavy training or sampling burden and unsatisfactory performance. Since T2I DMs consist of multiple components, with the text encoder being the most fine-tunable and front-end module, this paper focuses on mitigating bias by fine-tuning text embeddings. To validate feasibility, we observe that the text encoder's neutral embedding output shows substantial skewness across image embeddings of various attributes in the CLIP space. More importantly, the noise prediction network further amplifies this imbalance. To finetune the text embedding, we propose a collaborative distance-constrained debiasing strategy that balances embedding distances to improve fairness without auxiliary references. However, mitigating bias can compromise the original generation quality. To address this, we introduce a two-stage text-guided sampling strategy to limit when the debiased text encoder intervenes. Extensive experiments demonstrate that LightFair is effective and efficient. Notably, on Stable Diffusion v1.5, our method achieves SOTA debiasing at just $1/4$ of the training burden, with virtually no increase in sampling burden. The code is available at https://github.com/boyuh/LightFair.
Abstract（参考訳）: 本稿では,テキストエンコーダの悪影響に対処して,公平なテキスト・画像拡散モデル(T2I DM)を実現するための軽量なLightFairを提案する。既存のほとんどの手法は、フルパラメータトレーニングのために拡散モデルの異なる部分を分割するか、補正のために補助的なネットワークに依存している。重いトレーニングやサンプリングの負担や不満足なパフォーマンスを伴います。 T2I DMは複数のコンポーネントから構成されており、テキストエンコーダは最も微調整可能なフロントエンドモジュールである。実現可能性を検証するために,テキストエンコーダの中立な埋め込み出力は,CLIP空間内の様々な属性のイメージ埋め込みに対して,かなり歪みがあることを示す。さらに、ノイズ予測ネットワークは、この不均衡をさらに増幅する。テキスト埋め込みを微調整するために,埋め込み距離のバランスを保ち,補助参照を伴わずに公平性を向上する,協調的距離制約型脱バイアス戦略を提案する。しかし、緩和バイアスは、元の世代品質を損なう可能性がある。そこで本研究では,デバイアステキストエンコーダが介在する場合に制限を加えるための2段階のテキスト誘導サンプリング戦略を提案する。大規模な実験は、LightFairが効果的で効率的であることを示している。特に,Stable Diffusion v1.5では,SOTA脱バイアスをトレーニング負荷のわずか1/4ドルで達成し,サンプリング負荷は実質的に増加しない。コードはhttps://github.com/boyuh/LightFair.comで公開されている。

論文の概要: LightFair: Towards an Efficient Alternative for Fair T2I Diffusion via Debiasing Pre-trained Text Encoders

関連論文リスト