Fugu-MT 論文翻訳(概要): F3-Pruning: A Training-Free and Generalized Pruning Strategy towards Faster and Finer Text-to-Video Synthesis

論文の概要: F3-Pruning: A Training-Free and Generalized Pruning Strategy towards Faster and Finer Text-to-Video Synthesis

arxiv url: http://arxiv.org/abs/2312.03459v1
Date: Wed, 6 Dec 2023 12:34:47 GMT
ステータス: 翻訳完了
システム内更新日: 2023-12-07 14:55:48.863085
Title: F3-Pruning: A Training-Free and Generalized Pruning Strategy towards Faster and Finer Text-to-Video Synthesis
Title（参考訳）: F3-Pruning: 高速かつ高速なテキスト・ビデオ合成に向けた学習自由で汎用的なPruning戦略
Authors: Sitong Su, Jianzhi Liu, Lianli Gao, Jingkuan Song
Abstract要約: 変圧器と拡散モデルを用いた2つの主流T2Vモデルの推論過程について検討する。本稿では、時間的余分な注意重みを突破するF3プルーニングと呼ばれるトレーニングフリーで一般化されたプルーニング戦略を提案する。古典的なトランスフォーマーベースモデルCogVideoと典型的な拡散ベースモデルTune-A-Videoを用いた3つのデータセットの大規模な実験により、F3-Pruningの有効性が検証された。
参考スコア（独自算出の注目度）: 94.10861578387443
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recently Text-to-Video (T2V) synthesis has undergone a breakthrough by training transformers or diffusion models on large-scale datasets. Nevertheless, inferring such large models incurs huge costs.Previous inference acceleration works either require costly retraining or are model-specific.To address this issue, instead of retraining we explore the inference process of two mainstream T2V models using transformers and diffusion models.The exploration reveals the redundancy in temporal attention modules of both models, which are commonly utilized to establish temporal relations among frames.Consequently, we propose a training-free and generalized pruning strategy called F3-Pruning to prune redundant temporal attention weights.Specifically, when aggregate temporal attention values are ranked below a certain ratio, corresponding weights will be pruned.Extensive experiments on three datasets using a classic transformer-based model CogVideo and a typical diffusion-based model Tune-A-Video verify the effectiveness of F3-Pruning in inference acceleration, quality assurance and broad applicability.
Abstract（参考訳）: 最近のtext-to-video(t2v)合成は、大規模なデータセット上でトランスフォーマーや拡散モデルをトレーニングすることで画期的な進歩を遂げている。 Nevertheless, inferring such large models incurs huge costs.Previous inference acceleration works either require costly retraining or are model-specific.To address this issue, instead of retraining we explore the inference process of two mainstream T2V models using transformers and diffusion models.The exploration reveals the redundancy in temporal attention modules of both models, which are commonly utilized to establish temporal relations among frames.Consequently, we propose a training-free and generalized pruning strategy called F3-Pruning to prune redundant temporal attention weights.Specifically, when aggregate temporal attention values are ranked below a certain ratio, corresponding weights will be pruned.Extensive experiments on three datasets using a classic transformer-based model CogVideo and a typical diffusion-based model Tune-A-Video verify the effectiveness of F3-Pruning in inference acceleration, quality assurance and broad applicability.

関連論文リスト

Time-adaptive Video Frame Interpolation based on Residual Diffusion [2.5261465733373965]
ビデオフレーム(VFI)の拡散に基づく新しい手法を提案する。本研究では,ビデオフレーム(VFI)の拡散に基づく新しい手法を提案する。我々は、最先端のモデルに対する広範な比較を行い、このモデルがアニメーションビデオでこれらのモデルより優れていることを示す。
論文参考訳（メタデータ） (2025-04-07T18:15:45Z)
SuperFlow++: Enhanced Spatiotemporal Consistency for Cross-Modal Data Pretraining [62.433137130087445]
SuperFlow++は、連続するカメラペアを使用して事前トレーニングと下流タスクを統合する新しいフレームワークである。 SuperFlow++は様々なタスクや運転条件で最先端のメソッドよりも優れています。強力な一般化性と計算効率により、SuperFlow++は、自動運転におけるデータ効率の高いLiDARベースの認識のための新しいベンチマークを確立する。
論文参考訳（メタデータ） (2025-03-25T17:59:57Z)
Frequency-Guided Diffusion Model with Perturbation Training for Skeleton-Based Video Anomaly Detection [43.49146665908238]
ビデオ異常検出(VAD)はコンピュータビジョンにおいて不可欠だが複雑なオープンセットタスクである。摂動トレーニングを用いた新しい周波数誘導拡散モデルを提案する。 2次元離散コサイン変換(DCT)を用いて、高周波(局所)と低周波(球状)の運動成分を分離する。
論文参考訳（メタデータ） (2024-12-04T05:43:53Z)
ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation [83.62931466231898]
本稿では,長期ビデオ生成のための自己回帰モデルを用いた拡散変換器を高速化するフレームワークARLONを提案する。潜在ベクトル量子変分オートコーダ(VQ-VAE)は、DiTモデルの入力潜時空間をコンパクトなビジュアルトークンに圧縮する。適応ノルムベースのセマンティックインジェクションモジュールは、ARモデルから粗い離散視覚ユニットをDiTモデルに統合する。
論文参考訳（メタデータ） (2024-10-27T16:28:28Z)
RAVEN: Rethinking Adversarial Video Generation with Efficient Tri-plane Networks [93.18404922542702]
本稿では,長期的空間的および時間的依存関係に対処する新しいビデオ生成モデルを提案する。提案手法は,3次元認識型生成フレームワークにインスパイアされた,明示的で単純化された3次元平面のハイブリッド表現を取り入れたものである。我々のモデルは高精細度ビデオクリップを解像度256時間256$ピクセルで合成し、フレームレート30fpsで5ドル以上まで持続する。
論文参考訳（メタデータ） (2024-01-11T16:48:44Z)
Diffusion Recommender Model [85.9640416600725]
そこで我々は,DiffRecと呼ばれる新しい拡散レコメンダモデルを提案し,その生成過程を認知的に学習する。ユーザインタラクションにおけるパーソナライズされた情報を維持するため、DiffRecは追加のノイズを低減し、画像合成のような純粋なノイズに対するユーザのインタラクションを損なうことを避ける。
論文参考訳（メタデータ） (2023-04-11T04:31:00Z)
Towards Long-Term Time-Series Forecasting: Feature, Pattern, and Distribution [57.71199089609161]
長期的時系列予測(LTTF)は、風力発電計画など、多くのアプリケーションで需要が高まっている。トランスフォーマーモデルは、高い計算自己認識機構のため、高い予測能力を提供するために採用されている。 LTTFの既存の手法を3つの面で区別する,Conformer という,効率的なTransformer ベースモデルを提案する。
論文参考訳（メタデータ） (2023-01-05T13:59:29Z)
Imaging through the Atmosphere using Turbulence Mitigation Transformer [15.56320865332645]
大気の乱流によって歪んだ画像の復元は、長距離イメージングの応用において、ユビキタスな問題である。既存のディープラーニングベースの手法は、特定のテスト条件において有望な結果を示している。本稿では,これらの問題に対処する乱流緩和トランス (TMT) を提案する。
論文参考訳（メタデータ） (2022-07-13T18:33:26Z)
Temporal Transformer Networks with Self-Supervision for Action Recognition [13.00827959393591]
自己監督型時変変器ネットワーク(TTSN)について紹介する。 TTSNは時間変圧器モジュールと時間列セルフスーパービジョンモジュールから構成される。提案するTTSNは,動作認識のための最先端性能を達成する上で有望である。
論文参考訳（メタデータ） (2021-12-14T12:53:53Z)
Long-Short Temporal Contrastive Learning of Video Transformers [62.71874976426988]
ビデオのみのデータセットにおけるビデオトランスフォーマーの自己教師付き事前トレーニングは、大規模画像データセットでの教師付き事前トレーニングで得られたものよりも、同等以上のアクション認識結果につながる可能性がある。我々の手法は、長短時空間コントラスト学習(Long-Short Temporal Contrastive Learning)と呼ばれ、ビデオトランスフォーマーが、より長い時間的範囲から捉えた時間的文脈を予測することによって、効果的なクリップレベルの表現を学習することを可能にする。
論文参考訳（メタデータ） (2021-06-17T02:30:26Z)
Convolutional Tensor-Train LSTM for Spatio-temporal Learning [116.24172387469994]
本稿では,ビデオシーケンスの長期相関を効率的に学習できる高次LSTMモデルを提案する。これは、時間をかけて畳み込み特徴を組み合わせることによって予測を行う、新しいテンソルトレインモジュールによって達成される。この結果は,幅広いアプリケーションやデータセットにおいて,最先端のパフォーマンス向上を実現している。
論文参考訳（メタデータ） (2020-02-21T05:00:01Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。