Fugu-MT 論文翻訳(概要): LibraGen: Playing a Balance Game in Subject-Driven Video Generation

論文の概要: LibraGen: Playing a Balance Game in Subject-Driven Video Generation

arxiv url: http://arxiv.org/abs/2603.13506v1
Date: Fri, 13 Mar 2026 18:36:23 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-17 16:19:35.246959
Title: LibraGen: Playing a Balance Game in Subject-Driven Video Generation
Title（参考訳）: LibraGen:主題駆動ビデオ生成におけるバランスゲーム
Authors: Jiahao Zhu, Shanshan Lao, Lijie Liu, Gen Li, Tianhao Qi, Wei Han, Bingchuan Li, Fangfang Liu, Zhuowei Chen, Tianxiang Ma, Qian HE, Yi Zhou, Xiaohua Xie,
Abstract要約: 本稿では,S2V生成の基盤モデルの拡張を,本質的なVGFM強度とS2V能力のバランスゲームとみなす新しいフレームワークを提案する。自動と手動のデータフィルタリングを組み合わせたハイブリッドパイプラインを構築し、全体的なデータ品質を改善します。実験結果から、LibraGenは、数千スケールのトレーニングデータのみを使用して、オープンソースと商用のS2Vモデルの両方より優れていることが示された。
参考スコア（独自算出の注目度）: 49.4880360924921
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: With the advancement of video generation foundation models (VGFMs), customized generation, particularly subject-to-video (S2V), has attracted growing attention. However, a key challenge lies in balancing the intrinsic priors of a VGFM, such as motion coherence, visual aesthetics, and prompt alignment, with its newly derived S2V capability. Existing methods often neglect this balance by enhancing one aspect at the expense of others. To address this, we propose LibraGen, a novel framework that views extending foundation models for S2V generation as a balance game between intrinsic VGFM strengths and S2V capability. Specifically, guided by the core philosophy of "Raising the Fulcrum, Tuning to Balance," we identify data quality as the fulcrum and advocate a quality-over-quantity approach. We construct a hybrid pipeline that combines automated and manual data filtering to improve overall data quality. To further harmonize the VGFM's native capabilities with its S2V extension, we introduce a Tune-to-Balance post-training paradigm. During supervised fine-tuning, both cross-pair and in-pair data are incorporated, and model merging is employed to achieve an effective trade-off. Subsequently, two tailored direct preference optimization (DPO) pipelines, namely Consis-DPO and Real-Fake DPO, are designed and merged to consolidate this balance. During inference, we introduce a time-dependent dynamic classifier-free guidance scheme to enable flexible and fine-grained control. Experimental results demonstrate that LibraGen outperforms both open-source and commercial S2V models using only thousand-scale training data.
Abstract（参考訳）: ビデオ生成基盤モデル(VGFM)の進歩に伴い、カスタマイズされた世代、特に主観的ビデオ(S2V)が注目されている。しかしながら、重要な課題は、動きコヒーレンス、視覚美学、即時アライメントなどのVGFMの本質的な先行と、新たに派生したS2V能力のバランスをとることである。既存の方法は、ある側面を他の側面の犠牲にすることで、このバランスを無視することが多い。そこで本研究では,S2V生成の基盤モデルを,本質的なVGFM強度とS2V能力のバランスゲームとみなす新しいフレームワークLibraGenを提案する。具体的には、"Fulcrum, Tuning to Balance"という中核的な哲学に導かれ、データ品質をフルクラムとして認識し、品質オーバークオリティのアプローチを提唱します。自動と手動のデータフィルタリングを組み合わせたハイブリッドパイプラインを構築し、全体的なデータ品質を改善します。 S2V拡張でVGFMのネイティブ機能をさらに調和させるため、Tune-to-Balanceポストトレーニングパラダイムを導入する。教師付き微調整の間、クロスペアデータとインペアデータの両方が組み込まれ、効果的なトレードオフを達成するためにモデルマージが使用される。その後、コンシスDPO(Consis-DPO)とリアルフェイクDPO(Real-Fake DPO)という2つの最適化された直接優先最適化パイプラインが設計され、このバランスを統合するために統合される。推論中に、柔軟できめ細かい制御を可能にする時間依存型動的分類器フリーガイダンススキームを導入する。実験結果から、LibraGenは、数千スケールのトレーニングデータのみを使用して、オープンソースと商用のS2Vモデルの両方より優れていることが示された。

論文の概要: LibraGen: Playing a Balance Game in Subject-Driven Video Generation

関連論文リスト