Fugu-MT 論文翻訳(概要): AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning

論文の概要: AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning

arxiv url: http://arxiv.org/abs/2402.00769v1
Date: Thu, 1 Feb 2024 16:58:11 GMT
ステータス: 翻訳完了
システム内更新日: 2024-02-02 14:26:31.212281
Title: AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning
Title（参考訳）: animatelcm: 分散学習によるパーソナライズされた拡散モデルとアダプタのアニメーション化を加速する
Authors: Fu-Yun Wang, Zhaoyang Huang, Xiaoyu Shi, Weikang Bian, Guanglu Song, Yu Liu, Hongsheng Li
Abstract要約: 最小ステップで高忠実度映像を生成できるAnimateLCMを提案する。生のビデオデータセット上で一貫性学習を直接実行する代わりに、分離された一貫性学習戦略を提案する。画像条件付き映像生成とレイアウト条件付き映像生成における提案手法の有効性を検証し,性能評価の結果を得た。
参考スコア（独自算出の注目度）: 47.681633892135125
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Video diffusion models has been gaining increasing attention for its ability to produce videos that are both coherent and of high fidelity. However, the iterative denoising process makes it computationally intensive and time-consuming, thus limiting its applications. Inspired by the Consistency Model (CM) that distills pretrained image diffusion models to accelerate the sampling with minimal steps and its successful extension Latent Consistency Model (LCM) on conditional image generation, we propose AnimateLCM, allowing for high-fidelity video generation within minimal steps. Instead of directly conducting consistency learning on the raw video dataset, we propose a decoupled consistency learning strategy that decouples the distillation of image generation priors and motion generation priors, which improves the training efficiency and enhance the generation visual quality. Additionally, to enable the combination of plug-and-play adapters in stable diffusion community to achieve various functions (e.g., ControlNet for controllable generation). we propose an efficient strategy to adapt existing adapters to our distilled text-conditioned video consistency model or train adapters from scratch without harming the sampling speed. We validate the proposed strategy in image-conditioned video generation and layout-conditioned video generation, all achieving top-performing results. Experimental results validate the effectiveness of our proposed method. Code and weights will be made public. More details are available at https://github.com/G-U-N/AnimateLCM.
Abstract（参考訳）: ビデオ拡散モデルは、コヒーレントで高い忠実度を持つビデオを生成する能力で注目を集めている。しかし、反復デノゲーションプロセスは計算集約的で時間を要するため、その応用は制限される。最小ステップでサンプリングを高速化するために訓練済み画像拡散モデルを蒸留するConsistency Model (CM) と条件付き画像生成のためのLatent Consistency Model (LCM) に着想を得て,AnimateLCMを提案する。生のビデオデータセットで一貫性学習を直接行うのではなく、画像生成優先と動き生成優先の蒸留を分離する分離一貫性学習戦略を提案し、学習効率の向上と視覚品質の向上を図る。さらに、安定拡散コミュニティにおけるプラグアンドプレイアダプタの組み合わせにより、様々な機能を達成することができる(例えば、制御可能な生成のためのコントロールネット)。既存のアダプタを蒸留したテキストコンディショニングビデオ一貫性モデルや,サンプリング速度を損なうことなくスクラッチからアダプタをトレーニングする効率的な戦略を提案する。画像条件付き映像生成とレイアウト条件付き映像生成における提案手法の有効性を検証し,性能評価の結果を得た。提案手法の有効性を実験的に検証した。コードと重み付けは公開されます。詳細はhttps://github.com/G-U-N/AnimateLCM.comで確認できる。

関連論文リスト

Subject-driven Video Generation via Disentangled Identity and Motion [52.54835936914813]
本稿では,ゼロショットにおける時間的ダイナミクスから被験者固有の学習を分離し,追加のチューニングを伴わずに,主題駆動のカスタマイズビデオ生成モデルを訓練することを提案する。提案手法は、ゼロショット設定で既存のビデオカスタマイズモデルよりも優れた、強力な被写体整合性とスケーラビリティを実現する。
論文参考訳（メタデータ） (2025-04-23T06:48:31Z)
Whats in a Video: Factorized Autoregressive Decoding for Online Dense Video Captioning [71.94122309290537]
ビデオの高密度キャプションを生成するための,効率的なオンライン手法を提案する。我々のモデルは、新しい自己回帰因子化復号化アーキテクチャを使用している。提案手法は,オフライン手法とオンライン手法の両方と比較して優れた性能を示し,計算コストを20%削減する。
論文参考訳（メタデータ） (2024-11-22T02:46:44Z)
Movie Gen: A Cast of Media Foundation Models [133.41504332082667]
高品質の1080pHDビデオを生成する基礎モデルのキャストであるMovie Genについて紹介する。ユーザの画像に基づいて,高精度な命令ベースのビデオ編集やパーソナライズされたビデオの生成などの追加機能を示す。
論文参考訳（メタデータ） (2024-10-17T16:22:46Z)
DreamVideo: Composing Your Dream Videos with Customized Subject and Motion [52.7394517692186]
対象の静的画像からパーソナライズされたビデオを生成する新しいアプローチであるDreamVideoを紹介します。 DreamVideoは、このタスクを、トレーニング済みのビデオ拡散モデルを活用することによって、主観学習とモーション学習の2つの段階に分離する。モーション学習では、対象のモーションパターンを効果的にモデル化するために、モーションアダプタを設計し、所定のビデオに微調整する。
論文参考訳（メタデータ） (2023-12-07T16:57:26Z)
InstructVid2Vid: Controllable Video Editing with Natural Language Instructions [97.17047888215284]
InstructVid2Vidは、人間の言語命令でガイドされたビデオ編集のためのエンドツーエンドの拡散ベースの方法論である。我々のアプローチは、自然言語ディレクティブによって案内される映像操作を強化し、サンプルごとの微調整や逆変換の必要性を排除します。
論文参考訳（メタデータ） (2023-05-21T03:28:13Z)
Style-A-Video: Agile Diffusion for Arbitrary Text-based Video Style Transfer [13.098901971644656]
本稿では,Style-A-Video というゼロショットビデオスタイリング手法を提案する。画像遅延拡散モデルを用いた生成事前学習型トランスフォーマーを用いて、簡潔なテキスト制御ビデオスタイリングを実現する。テストの結果,従来のソリューションよりも少ない使用量で,優れたコンテンツ保存とスタイリスティックな性能が得られることがわかった。
論文参考訳（メタデータ） (2023-05-09T14:03:27Z)
Video Generation Beyond a Single Clip [76.5306434379088]
ビデオ生成モデルは、実際のビデオの長さと比較して比較的短いビデオクリップしか生成できない。多様なコンテンツや複数のイベントをカバーした長いビデオを生成するために,ビデオ生成プロセスを制御するための追加のガイダンスを提案する。提案手法は、固定時間ウィンドウ内でリアルな映像を生成することに焦点を当てた、既存の映像生成の取り組みを補完するものである。
論文参考訳（メタデータ） (2023-04-15T06:17:30Z)
Diverse Generation from a Single Video Made Possible [24.39972895902724]
本稿では,1つの自然なビデオから映像を生成し,操作するための高速で実用的な方法を提案する。本手法は,シングルビデオGANよりもリアルで高品質な結果を生成する。
論文参考訳（メタデータ） (2021-09-17T15:12:17Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。