Fugu-MT 論文翻訳(概要): Inference-based GAN Video Generation

論文の概要: Inference-based GAN Video Generation

arxiv url: http://arxiv.org/abs/2512.21776v1
Date: Thu, 25 Dec 2025 20:14:38 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-29 20:48:41.971129
Title: Inference-based GAN Video Generation
Title（参考訳）: 推論に基づくGANビデオ生成
Authors: Jingbo Yang, Adrian G. Bors,
Abstract要約: 可変エンコーダを用いた対向型非条件ビデオジェネレータの実現により,新しいタイプのビデオジェネレータを提案する。既存のモデルは、生成されたビデオの時間的スケーリングに苦労する。私たちは、数百から数千のフレームからなる長いビデオを生成するために、新しい、メモリ効率のアプローチを採用しています。
参考スコア（独自算出の注目度）: 47.53991869205973
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Video generation has seen remarkable progresses thanks to advancements in generative deep learning. Generated videos should not only display coherent and continuous movement but also meaningful movement in successions of scenes. Generating models such as Generative Adversarial Networks (GANs) or Variational Autoencoders (VAEs) and more recently Diffusion Networks have been used for generating short video sequences, usually of up to 16 frames. In this paper, we first propose a new type of video generator by enabling adversarial-based unconditional video generators with a variational encoder, akin to a VAE-GAN hybrid structure, in order to enable the generation process with inference capabilities. The proposed model, as in other video deep learning-based processing frameworks, incorporates two processing branches, one for content and another for movement. However, existing models struggle with the temporal scaling of the generated videos. In classical approaches when aiming to increase the generated video length, the resulting video quality degrades, particularly when considering generating significantly long sequences. To overcome this limitation, our research study extends the initially proposed VAE-GAN video generation model by employing a novel, memory-efficient approach to generate long videos composed of hundreds or thousands of frames ensuring their temporal continuity, consistency and dynamics. Our approach leverages a Markov chain framework with a recall mechanism, with each state representing a VAE-GAN short-length video generator. This setup allows for the sequential connection of generated video sub-sequences, enabling temporal dependencies, resulting in meaningful long video sequences.
Abstract（参考訳）: ビデオ生成は、生成的深層学習の進歩により、目覚ましい進歩を遂げている。生成したビデオは、一貫性のある連続的な動きだけでなく、シーンの連続における意味のある動きも表示すべきである。 GAN(Generative Adversarial Networks)やVAE(VAE)などの生成モデルや、最近では16フレームまでの短いビデオシーケンスを生成するためにDiffusion Networksが使用されている。本稿では,VAE-GANハイブリッド構造に類似した可変エンコーダを用いた非条件ビデオジェネレータを実現することで,推論機能付き生成プロセスを実現することにより,新しいタイプのビデオジェネレータを提案する。提案モデルは、他のビデオ深層学習ベースの処理フレームワークと同様に、コンテンツ用と移動用という2つの処理ブランチを組み込んでいる。しかし、既存のモデルは生成されたビデオの時間的スケーリングに苦慮している。古典的手法では、生成したビデオの長さを増やすことを目的として、特に非常に長いシーケンスの生成を考えると、結果の画質が低下する。この制限を克服するため,本稿では,時間的連続性,一貫性,ダイナミック性を確保するために,数百フレームから数千フレームからなる長ビデオを生成するための,メモリ効率の高い新しい手法を用いて,当初提案されていたVAE-GANビデオ生成モデルを拡張した。提案手法は,VAE-GAN短長ビデオジェネレータのそれぞれの状態を表すリコール機構を備えたマルコフ連鎖フレームワークを利用する。このセットアップは、生成されたビデオサブシーケンスのシーケンシャルな接続を可能にし、時間的依存を可能にし、意味のある長いビデオシーケンスをもたらす。

論文の概要: Inference-based GAN Video Generation

関連論文リスト