Fugu-MT 論文翻訳(概要): Let Triggers Control: Frequency-Aware Dropout for Effective Token Control

論文の概要: Let Triggers Control: Frequency-Aware Dropout for Effective Token Control

arxiv url: http://arxiv.org/abs/2603.27199v1
Date: Sat, 28 Mar 2026 08:55:54 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-31 23:18:44.844362
Title: Let Triggers Control: Frequency-Aware Dropout for Effective Token Control
Title（参考訳）: トリガー制御:効果的なトークン制御のための周波数対応ドロップアウト
Authors: Junyoung Koh, Hoyeon Moon, Dongha Kim, Seungmin Lee, Sanghyun Park, Min Song,
Abstract要約: 我々は新しいパラメータを追加することなく制御性を改善するために周波数対応ドロップアウト(FAD)を提案する。 FADは、共起分析とカリキュラムにインスパイアされたスケジューリングの2つの重要なコンポーネントで構成されている。本手法は,テキスト・ツー・イメージ生成における制御性とパーソナライズを向上する,シンプルで効果的なドロップアウト戦略を提供する。
参考スコア（独自算出の注目度）: 8.72880783870241
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Text-to-image models such as Stable Diffusion have achieved unprecedented levels of high-fidelity visual synthesis. As these models advance, personalization of generative models -- commonly facilitated through Low-Rank Adaptation (LoRA) with a dedicated trigger token -- has become a significant area of research. Previous works have naively assumed that fine-tuning with a single trigger token to represent new concepts. However, this often results in poor controllability, where the trigger token alone fails to reliably evoke the intended concept. We attribute this issue to the frequent co-occurrence of the trigger token with the surrounding context during fine-tuning, which entangles their representations and compromises the token's semantic distinctiveness. To disentangle this, we propose Frequency-Aware Dropout (FAD) -- a novel regularization technique that improves prompt controllability without adding new parameters. FAD consists of two key components: co-occurrence analysis and curriculum-inspired scheduling. Qualitative and quantitative analyses across token-based diffusion models (SD~1.5 and SDXL) and natural language--driven backbones (FLUX and Qwen-Image) demonstrate consistent gains in prompt fidelity, stylistic precision, and user-perceived quality. Our method provides a simple yet effective dropout strategy that enhances controllability and personalization in text-to-image generation. Notably, it achieves these improvements without introducing additional parameters or architectural modifications, making it readily applicable to existing models with minimal computational overhead.
Abstract（参考訳）: 安定拡散のようなテキストと画像のモデルは、前例のない高忠実度視覚合成を実現している。これらのモデルが進歩するにつれて、ローランド適応(LoRA)を通じて一般的に促進される生成モデルのパーソナライズが、重要な研究領域となっている。これまでの研究では、新しい概念を表現するために単一のトリガートークンで微調整をしていた。しかし、これは多くの場合、トリガートークンだけで意図した概念を確実に引き起こさない、制御性に欠ける。この問題は、トリガートークンが微調整中に周囲のコンテキストと頻繁に共起し、それらの表現が絡み合い、トークンの意味的特徴を損なうためである。これを回避するために、新しいパラメータを追加することなく、迅速な制御性を向上する新しい正規化手法である周波数対応ドロップアウト(FAD)を提案する。 FADは、共起分析とカリキュラムにインスパイアされたスケジューリングの2つの重要なコンポーネントで構成されている。トークンベースの拡散モデル(SD~1.5およびSDXL)と自然言語駆動のバックボーン(FLUXおよびQwen-Image)の質的および定量的分析は、迅速な忠実さ、スタイリスティックな精度、およびユーザ知覚品質において一貫した向上を示す。本手法は,テキスト・ツー・イメージ生成における制御性とパーソナライズを向上する,シンプルで効果的なドロップアウト戦略を提供する。特に、追加のパラメータやアーキテクチャの変更を導入することなくこれらの改善を実現し、計算オーバーヘッドが最小限である既存のモデルにも容易に適用できます。

論文の概要: Let Triggers Control: Frequency-Aware Dropout for Effective Token Control

関連論文リスト