Fugu-MT 論文翻訳(概要): Prompt2Effect: Training-Free Image-to-Video Model Specialization via LoRA Generation

論文の概要: Prompt2Effect: Training-Free Image-to-Video Model Specialization via LoRA Generation

arxiv url: http://arxiv.org/abs/2606.13971v1
Date: Thu, 11 Jun 2026 23:26:44 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-15 16:00:42.68156
Title: Prompt2Effect: Training-Free Image-to-Video Model Specialization via LoRA Generation
Title（参考訳）: Prompt2エフェクト:LoRAジェネレーションによる訓練不要の画像-映像-映像モデルスペシャライゼーション
Authors: Xiaomeng Yang, Yanyu Li, Gordon Guocheng Qian, Ivan Skorokhodov, Viacheslav Ivanov, Avalon Vinella, Xuan Zhang, Yanzhi Wang, Sergey Tulyakov, Anil Kag,
Abstract要約: Prompt2Effectは、効果特異的なLoRA重みを1つの前方通過で直接合成することにより、効果ごとのトレーニングを改善できる、重量駆動型ハイパーネットワークである。 Prompt2Effect は従来の LoRA ファインタニングと比較してビデオ品質や効果アライメントに優れることを示した。
参考スコア（独自算出の注目度）: 71.45435100897093
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Personalizing Image-to-Video (I2V) diffusion models with specific visual effects is increasingly demanded for high-end video generation. Current practice requires training a separate Low-Rank Adaptation (LoRA) module for each effect, incurring substantial data curation and iterative optimization costs that hinder interactive control. We present Prompt2Effect, a weight-driven hypernetwork that amortizes per-effect training by directly synthesizing effect-specific LoRA weights in a single forward pass. Unlike prior hypernetworks that regress adapter weights purely from semantics, Prompt2Effect is explicitly conditioned on the frozen base model weights, grounding weight prediction in the structural geometry of each layer. Furthermore, instead of predicting raw LoRA matrices, we introduce an SVD-canonicalized parameterization that resolves factorization ambiguity and stabilizes large-scale weight synthesis. Together, these design principles enable accurate and scalable LoRA prediction for high-dimensional I2V diffusion models. Extensive experiments demonstrate that Prompt2Effect achieves on-par or superior video quality and effect alignment compared to conventional LoRA fine-tuning, while reducing the computational cost from 56 GPU training hours to 3.3 seconds of hypernetwork inference. When used as initialization for subsequent fine-tuning, our predicted weights further improve final performance and accelerate optimization by approximately 10x.
Abstract（参考訳）: 特定の視覚効果を持つ画像間拡散モデル(I2V)のパーソナライズが、ハイエンドのビデオ生成にますます求められている。現在のプラクティスでは、各エフェクトに対して別々のローランド適応(LoRA)モジュールをトレーニングし、実質的なデータキュレーションと、インタラクティブな制御を妨げる反復最適化コストを発生させる必要がある。 Prompt2Effectは、効果特異的なLoRA重みを1つの前方通過で直接合成することにより、効果ごとのトレーニングを改善できる、重量駆動型ハイパーネットワークである。アダプタが純粋にセマンティクスから重みを取り除いた以前のハイパーネットワークとは異なり、Prompt2Effectは凍結ベースモデルの重みに明示的に条件付けされ、各層の構造幾何学における重み予測の基礎となる。さらに,生のLoRA行列を予測する代わりに,因子分解のあいまいさを解消し,大規模重量合成を安定化するSVDカノニカル化パラメタライゼーションを導入する。これらの設計原則は、高次元I2V拡散モデルに対する正確でスケーラブルなLoRA予測を可能にする。大規模な実験により、Prompt2Effectは従来のLoRAファインチューニングと比較して、ビデオ品質や効果の調整に優れており、計算コストは56GPUトレーニング時間から3.3秒のハイパーネットワーク推論に削減されている。その後の微調整の初期化に使用すると、予測重みは最終性能をさらに向上し、約10倍の最適化を加速する。

論文の概要: Prompt2Effect: Training-Free Image-to-Video Model Specialization via LoRA Generation

関連論文リスト