Related papers: Recurrent Deconvolutional Generative Adversarial Networks with Application to Text Guided Video Generation

Recurrent Deconvolutional Generative Adversarial Networks with Application to Text Guided Video Generation

URL: http://arxiv.org/abs/2008.05856v1
Date: Thu, 13 Aug 2020 12:22:27 GMT
Title: Recurrent Deconvolutional Generative Adversarial Networks with Application to Text Guided Video Generation
Authors: Hongyuan Yu, Yan Huang, Lihong Pi, Liang Wang
Abstract summary: We propose a recurrent deconvolutional generative adversarial network (RD-GAN), which includes a 3D convolutional neural network (3D-CNN) as the discriminator. The proposed model can be jointly trained by pushing the RDN to generate realistic videos so that the 3D-CNN cannot distinguish them from real ones.
Score: 11.15855312510806
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper proposes a novel model for video generation and especially makes the attempt to deal with the problem of video generation from text descriptions, i.e., synthesizing realistic videos conditioned on given texts. Existing video generation methods cannot be easily adapted to handle this task well, due to the frame discontinuity issue and their text-free generation schemes. To address these problems, we propose a recurrent deconvolutional generative adversarial network (RD-GAN), which includes a recurrent deconvolutional network (RDN) as the generator and a 3D convolutional neural network (3D-CNN) as the discriminator. The RDN is a deconvolutional version of conventional recurrent neural network, which can well model the long-range temporal dependency of generated video frames and make good use of conditional information. The proposed model can be jointly trained by pushing the RDN to generate realistic videos so that the 3D-CNN cannot distinguish them from real ones. We apply the proposed RD-GAN to a series of tasks including conventional video generation, conditional video generation, video prediction and video classification, and demonstrate its effectiveness by achieving well performance.

Related papers

SR-NeRV: Improving Embedding Efficiency of Neural Video Representation via Super-Resolution [0.0]
Implicit Neural Representations (INRs) have garnered significant attention for their ability to model complex signals across a variety of domains. We propose an INR-based video representation method that integrates a general-purpose super-resolution (SR) network.
arXiv Detail & Related papers (2025-04-30T03:31:40Z)
Generative Video Semantic Communication via Multimodal Semantic Fusion with Large Model [55.71885688565501]
We propose a scalable generative video semantic communication framework that extracts and transmits semantic information to achieve high-quality video reconstruction. Specifically, at the transmitter, description and other condition signals are extracted from the source video, functioning as text and structural semantics, respectively. At the receiver, the diffusion-based GenAI large models are utilized to fuse the semantics of the multiple modalities for reconstructing the video.
arXiv Detail & Related papers (2025-02-19T15:59:07Z)
World-consistent Video Diffusion with Explicit 3D Modeling [67.39618291644673]
World-consistent Video Diffusion (WVD) is a novel framework that incorporates explicit 3D supervision using XYZ images. We train a diffusion transformer to learn the joint distribution of RGB and XYZ frames. WVD unifies tasks like single-image-to-3D generation, multi-view stereo, and camera-controlled video generation.
arXiv Detail & Related papers (2024-12-02T18:58:23Z)
Hybrid Video Diffusion Models with 2D Triplane and 3D Wavelet Representation [35.52770785430601]
We propose a novel hybrid video autoencoder, called HVtemporalDM, which can capture intricate dependencies more effectively. The HVDM is trained by a hybrid video autoencoder which extracts a disentangled representation of the video. Our hybrid autoencoder provide a more comprehensive video latent enriching the generated videos with fine structures and details.
arXiv Detail & Related papers (2024-02-21T11:46:16Z)
UniVG: Towards UNIfied-modal Video Generation [27.07637246141562]
We propose a Unified-modal Video Genearation system capable of handling multiple video generation tasks across text and image modalities. Our method achieves the lowest Fr'echet Video Distance (FVD) on the public academic benchmark MSR-VTT, surpasses the current open-source methods in human evaluations, and is on par with the current close-source method Gen2.
arXiv Detail & Related papers (2024-01-17T09:46:13Z)
Conditional Generative Modeling for Images, 3D Animations, and Video [4.422441608136163]
dissertation attempts to drive innovation in the field of generative modeling for computer vision. Research focuses on architectures that offer transformations of noise and visual data, and the application of encoder-decoder architectures for generative tasks and 3D content manipulation.
arXiv Detail & Related papers (2023-10-19T21:10:39Z)
Progressive Fourier Neural Representation for Sequential Video Compilation [75.43041679717376]
Motivated by continual learning, this work investigates how to accumulate and transfer neural implicit representations for multiple complex video data over sequential encoding sessions. We propose a novel method, Progressive Fourier Neural Representation (PFNR), that aims to find an adaptive and compact sub-module in Fourier space to encode videos in each training session. We validate our PFNR method on the UVG8/17 and DAVIS50 video sequence benchmarks and achieve impressive performance gains over strong continual learning baselines.
arXiv Detail & Related papers (2023-06-20T06:02:19Z)
NeRF-GAN Distillation for Efficient 3D-Aware Generation with Convolutions [97.27105725738016]
integration of Neural Radiance Fields (NeRFs) and generative models, such as Generative Adversarial Networks (GANs) has transformed 3D-aware generation from single-view images. We propose a simple and effective method, based on re-using the well-disentangled latent space of a pre-trained NeRF-GAN in a pose-conditioned convolutional network to directly generate 3D-consistent images corresponding to the underlying 3D representations.
arXiv Detail & Related papers (2023-03-22T18:59:48Z)
Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks [68.93429034530077]
We propose dynamics-aware implicit generative adversarial network (DIGAN) for video generation. We show that DIGAN can be trained on 128 frame videos of 128x128 resolution, 80 frames longer than the 48 frames of the previous state-of-the-art method.
arXiv Detail & Related papers (2022-02-21T23:24:01Z)
Hierarchical Multimodal Transformer to Summarize Videos [103.47766795086206]
Motivated by the great success of transformer and the natural structure of video (frame-shot-video), a hierarchical transformer is developed for video summarization. To integrate the two kinds of information, they are encoded in a two-stream scheme, and a multimodal fusion mechanism is developed based on the hierarchical transformer. Practically, extensive experiments show that HMT surpasses most of the traditional, RNN-based and attention-based video summarization methods.
arXiv Detail & Related papers (2021-09-22T07:38:59Z)
Video Generation from Text Employing Latent Path Construction for Temporal Modeling [70.06508219998778]
Video generation is one of the most challenging tasks in Machine Learning and Computer Vision fields of study. In this paper, we tackle the text to video generation problem, which is a conditional form of video generation. We believe that video generation from natural language sentences will have an important impact on Artificial Intelligence.
arXiv Detail & Related papers (2021-07-29T06:28:20Z)
Non-Adversarial Video Synthesis with Learned Priors [53.26777815740381]
We focus on the problem of generating videos from latent noise vectors, without any reference input frames. We develop a novel approach that jointly optimize the input latent space, the weights of a recurrent neural network and a generator through non-adversarial learning. Our approach generates superior quality videos compared to the existing state-of-the-art methods.
arXiv Detail & Related papers (2020-03-21T02:57:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.