Recurrent Deconvolutional Generative Adversarial Networks with
Application to Text Guided Video Generation
- URL: http://arxiv.org/abs/2008.05856v1
- Date: Thu, 13 Aug 2020 12:22:27 GMT
- Title: Recurrent Deconvolutional Generative Adversarial Networks with
Application to Text Guided Video Generation
- Authors: Hongyuan Yu, Yan Huang, Lihong Pi, Liang Wang
- Abstract summary: We propose a recurrent deconvolutional generative adversarial network (RD-GAN), which includes a 3D convolutional neural network (3D-CNN) as the discriminator.
The proposed model can be jointly trained by pushing the RDN to generate realistic videos so that the 3D-CNN cannot distinguish them from real ones.
- Score: 11.15855312510806
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper proposes a novel model for video generation and especially makes
the attempt to deal with the problem of video generation from text
descriptions, i.e., synthesizing realistic videos conditioned on given texts.
Existing video generation methods cannot be easily adapted to handle this task
well, due to the frame discontinuity issue and their text-free generation
schemes. To address these problems, we propose a recurrent deconvolutional
generative adversarial network (RD-GAN), which includes a recurrent
deconvolutional network (RDN) as the generator and a 3D convolutional neural
network (3D-CNN) as the discriminator. The RDN is a deconvolutional version of
conventional recurrent neural network, which can well model the long-range
temporal dependency of generated video frames and make good use of conditional
information. The proposed model can be jointly trained by pushing the RDN to
generate realistic videos so that the 3D-CNN cannot distinguish them from real
ones. We apply the proposed RD-GAN to a series of tasks including conventional
video generation, conditional video generation, video prediction and video
classification, and demonstrate its effectiveness by achieving well
performance.
Related papers
- Hybrid Video Diffusion Models with 2D Triplane and 3D Wavelet Representation [35.52770785430601]
We propose a novel hybrid video autoencoder, called HVtemporalDM, which can capture intricate dependencies more effectively.
The HVDM is trained by a hybrid video autoencoder which extracts a disentangled representation of the video.
Our hybrid autoencoder provide a more comprehensive video latent enriching the generated videos with fine structures and details.
arXiv Detail & Related papers (2024-02-21T11:46:16Z) - UniVG: Towards UNIfied-modal Video Generation [27.07637246141562]
We propose a Unified-modal Video Genearation system capable of handling multiple video generation tasks across text and image modalities.
Our method achieves the lowest Fr'echet Video Distance (FVD) on the public academic benchmark MSR-VTT, surpasses the current open-source methods in human evaluations, and is on par with the current close-source method Gen2.
arXiv Detail & Related papers (2024-01-17T09:46:13Z) - Conditional Generative Modeling for Images, 3D Animations, and Video [4.422441608136163]
dissertation attempts to drive innovation in the field of generative modeling for computer vision.
Research focuses on architectures that offer transformations of noise and visual data, and the application of encoder-decoder architectures for generative tasks and 3D content manipulation.
arXiv Detail & Related papers (2023-10-19T21:10:39Z) - Progressive Fourier Neural Representation for Sequential Video
Compilation [75.43041679717376]
Motivated by continual learning, this work investigates how to accumulate and transfer neural implicit representations for multiple complex video data over sequential encoding sessions.
We propose a novel method, Progressive Fourier Neural Representation (PFNR), that aims to find an adaptive and compact sub-module in Fourier space to encode videos in each training session.
We validate our PFNR method on the UVG8/17 and DAVIS50 video sequence benchmarks and achieve impressive performance gains over strong continual learning baselines.
arXiv Detail & Related papers (2023-06-20T06:02:19Z) - NeRF-GAN Distillation for Efficient 3D-Aware Generation with
Convolutions [97.27105725738016]
integration of Neural Radiance Fields (NeRFs) and generative models, such as Generative Adversarial Networks (GANs) has transformed 3D-aware generation from single-view images.
We propose a simple and effective method, based on re-using the well-disentangled latent space of a pre-trained NeRF-GAN in a pose-conditioned convolutional network to directly generate 3D-consistent images corresponding to the underlying 3D representations.
arXiv Detail & Related papers (2023-03-22T18:59:48Z) - Generating Videos with Dynamics-aware Implicit Generative Adversarial
Networks [68.93429034530077]
We propose dynamics-aware implicit generative adversarial network (DIGAN) for video generation.
We show that DIGAN can be trained on 128 frame videos of 128x128 resolution, 80 frames longer than the 48 frames of the previous state-of-the-art method.
arXiv Detail & Related papers (2022-02-21T23:24:01Z) - Hierarchical Multimodal Transformer to Summarize Videos [103.47766795086206]
Motivated by the great success of transformer and the natural structure of video (frame-shot-video), a hierarchical transformer is developed for video summarization.
To integrate the two kinds of information, they are encoded in a two-stream scheme, and a multimodal fusion mechanism is developed based on the hierarchical transformer.
Practically, extensive experiments show that HMT surpasses most of the traditional, RNN-based and attention-based video summarization methods.
arXiv Detail & Related papers (2021-09-22T07:38:59Z) - Video Generation from Text Employing Latent Path Construction for
Temporal Modeling [70.06508219998778]
Video generation is one of the most challenging tasks in Machine Learning and Computer Vision fields of study.
In this paper, we tackle the text to video generation problem, which is a conditional form of video generation.
We believe that video generation from natural language sentences will have an important impact on Artificial Intelligence.
arXiv Detail & Related papers (2021-07-29T06:28:20Z) - Non-Adversarial Video Synthesis with Learned Priors [53.26777815740381]
We focus on the problem of generating videos from latent noise vectors, without any reference input frames.
We develop a novel approach that jointly optimize the input latent space, the weights of a recurrent neural network and a generator through non-adversarial learning.
Our approach generates superior quality videos compared to the existing state-of-the-art methods.
arXiv Detail & Related papers (2020-03-21T02:57:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.