Fugu-MT 論文翻訳(概要): Compressing Image Style Training into a Single Model Forward

論文の概要: Compressing Image Style Training into a Single Model Forward

arxiv url: http://arxiv.org/abs/2606.13809v1
Date: Thu, 11 Jun 2026 18:21:26 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-15 16:00:42.572149
Title: Compressing Image Style Training into a Single Model Forward
Title（参考訳）: イメージスタイルのトレーニングを1つのモデルに圧縮する
Authors: Zhongjie Duan, Yingda Chen,
Abstract要約: i2L(image-to-LoRA)は、スタイルのLoRAトレーニングを1つのフォワードパスに補正するフレームワークである。 i2Lは、既存のベースラインよりも、スタイルの忠実さ、迅速なアライメント、知覚品質を向上させる。
参考スコア（独自算出の注目度）: 3.579087003804642
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Diffusion-based style transfer must balance inference efficiency with stylization fidelity. Adapter-based methods are efficient, but they inject style as an external condition and can either weaken reference-specific appearance or copy reference semantics into the generated image. Optimization-based personalization methods such as LoRA internalize style more effectively, but require a separate training process for every new style. We introduce i2L (image-to-LoRA), a framework that amortizes style LoRA training into a single forward pass. Given one or more reference images, i2L predicts LoRA weights for a text-to-image model, enabling immediate style instantiation without per-style optimization. The architecture combines an image encoder, learnable LoRA queries, and compressed decoding heads that generate adapted matrices. Training on semantically diverse style pairs encourages the predictor to preserve appearance cues while suppressing reference-content copying. Experiments on Z-Image, FLUX.2, and Hidream-O1 show that i2L improves style fidelity, prompt alignment, and perceptual quality over existing baselines. Because i2L produces explicit LoRA weights, it also supports asymmetric classifier-free guidance, multi-reference style fusion, and composition with controllable-generation modules.
Abstract（参考訳）: 拡散に基づくスタイル転送は、推論効率とスタイル化の忠実さのバランスをとる必要がある。アダプタベースの手法は効率的だが、外部条件としてスタイルを注入し、参照固有の外観を弱めるか、生成された画像に参照セマンティクスをコピーする。 LoRAのような最適化に基づくパーソナライズ手法は、より効果的にスタイルを内部化するが、新しいスタイルごとに個別のトレーニングプロセスを必要とする。 i2L(image-to-LoRA)は,スタイルのLoRAトレーニングを1つのフォワードパスに補正するフレームワークである。 1つ以上の参照画像が与えられた場合、i2Lはテキスト画像モデルのLoRA重みを予測し、スタイルごとの最適化なしに即時スタイルのインスタンス化を可能にする。このアーキテクチャは、画像エンコーダ、学習可能なLoRAクエリ、適応行列を生成する圧縮復号ヘッドを組み合わせる。意味的に多様なスタイルペアのトレーニングは、参照内容のコピーを抑えながら、予測者が外観の手がかりを保存することを奨励する。 Z-Image、FLUX.2、Hidream-O1の実験では、i2Lは既存のベースラインよりもスタイルの忠実さ、迅速なアライメント、知覚品質を改善している。 i2Lは明示的なLoRA重みを生成するため、非対称な分類子なし誘導、マルチ参照スタイルの融合、および制御可能な世代モジュールによる合成もサポートする。

論文の概要: Compressing Image Style Training into a Single Model Forward

関連論文リスト