Related papers: HyperNVD: Accelerating Neural Video Decomposition via Hypernetworks

HyperNVD: Accelerating Neural Video Decomposition via Hypernetworks

URL: http://arxiv.org/abs/2503.17276v1
Date: Fri, 21 Mar 2025 16:24:47 GMT
Title: HyperNVD: Accelerating Neural Video Decomposition via Hypernetworks
Authors: Maria Pilligua, Danna Xue, Javier Vazquez-Corral,
Abstract summary: Existing video-layer decomposition models rely on implicit neural representations (INRs) trained independently for each video.<n>We propose a meta-learning strategy to learn a generic video decomposition model to speed up the training on new videos.<n>Our strategy mitigates the problem of single-video overfitting and, importantly, shortens the convergence of video decomposition on new, unseen videos.
Score: 4.536530093400348
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Decomposing a video into a layer-based representation is crucial for easy video editing for the creative industries, as it enables independent editing of specific layers. Existing video-layer decomposition models rely on implicit neural representations (INRs) trained independently for each video, making the process time-consuming when applied to new videos. Noticing this limitation, we propose a meta-learning strategy to learn a generic video decomposition model to speed up the training on new videos. Our model is based on a hypernetwork architecture which, given a video-encoder embedding, generates the parameters for a compact INR-based neural video decomposition model. Our strategy mitigates the problem of single-video overfitting and, importantly, shortens the convergence of video decomposition on new, unseen videos. Our code is available at: https://hypernvd.github.io/

Related papers

Video Decomposition Prior: A Methodology to Decompose Videos into Layers [74.36790196133505]
This paper introduces a novel video decomposition prior VDP' framework which derives inspiration from professional video editing practices.<n>VDP framework decomposes a video sequence into a set of multiple RGB layers and associated opacity levels.<n>We address tasks such as video object segmentation, dehazing, and relighting.
arXiv Detail & Related papers (2024-12-06T10:35:45Z)
Video to Video Generative Adversarial Network for Few-shot Learning Based on Policy Gradient [12.07088416665005]
We propose RL-V2V-GAN, a new deep neural network approach for conditional conditional-to-video synthesis. While preserving the style of the source video domain, our approach aims to learn a gradient mapping from a source video domain to a target video domain. Our experiments show that RL-V2V-GAN can produce temporally coherent video results.
arXiv Detail & Related papers (2024-10-28T01:35:10Z)
Fine-gained Zero-shot Video Sampling [21.42513407755273]
We propose a novel Zero-Shot video sampling algorithm, denoted as $mathcalZS2$. $mathcalZS2$ is capable of directly sampling high-quality video clips without any training or optimization. It achieves state-of-the-art performance in zero-shot video generation, occasionally outperforming recent supervised methods.
arXiv Detail & Related papers (2024-07-31T09:36:58Z)
MNeRV: A Multilayer Neural Representation for Videos [1.1079931610880582]
We propose a multilayer neural representation for videos (MNeRV) and design a new decoder M-Decoder and its matching encoder M-Encoder. MNeRV has more encoding and decoding layers, which effectively alleviates the problem of redundant model parameters. In the field of video regression reconstruction, we achieve better reconstruction quality (+4.06 PSNR) with fewer parameters.
arXiv Detail & Related papers (2024-07-10T03:57:29Z)
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames [57.758863967770594]
We build on the common paradigm of transferring large-scale, image--text models to video via shallow temporal fusion. We expose two limitations to the approach: (1) decreased spatial capabilities, likely due to poor video--language alignment in standard video datasets, and (2) higher memory consumption, bottlenecking the number of frames that can be processed.
arXiv Detail & Related papers (2023-12-12T16:10:19Z)
BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models [40.73982918337828]
We propose a training-free general-purpose video synthesis framework, coined as bf BIVDiff, via bridging specific image diffusion models and general text-to-video foundation diffusion models. Specifically, we first use a specific image diffusion model (e.g., ControlNet and Instruct Pix2Pix) for frame-wise video generation, then perform Mixed Inversion on the generated video, and finally input the inverted latents into the video diffusion models.
arXiv Detail & Related papers (2023-12-05T14:56:55Z)
Multi-object Video Generation from Single Frame Layouts [84.55806837855846]
We propose a video generative framework capable of synthesizing global scenes with local objects. Our framework is a non-trivial adaptation from image generation methods, and is new to this field. Our model has been evaluated on two widely-used video recognition benchmarks.
arXiv Detail & Related papers (2023-05-06T09:07:01Z)
Towards Scalable Neural Representation for Diverse Videos [68.73612099741956]
Implicit neural representations (INR) have gained increasing attention in representing 3D scenes and images. Existing INR-based methods are limited to encoding a handful of short videos with redundant visual content. This paper focuses on developing neural representations for encoding long and/or a large number of videos with diverse visual content.
arXiv Detail & Related papers (2023-03-24T16:32:19Z)
MagicVideo: Efficient Video Generation With Latent Diffusion Models [76.95903791630624]
We present an efficient text-to-video generation framework based on latent diffusion models, termed MagicVideo. Due to a novel and efficient 3D U-Net design and modeling video distributions in a low-dimensional space, MagicVideo can synthesize video clips with 256x256 spatial resolution on a single GPU card. We conduct extensive experiments and demonstrate that MagicVideo can generate high-quality video clips with either realistic or imaginary content.
arXiv Detail & Related papers (2022-11-20T16:40:31Z)
Overfitting the Data: Compact Neural Video Delivery via Content-aware Feature Modulation [38.889823516049056]
Methods divide a video into chunks, and stream LR video chunks and corresponding content-aware models to the client. With our method, each video chunk only requires less than $1% $ of original parameters to be streamed, achieving even better SR performance.
arXiv Detail & Related papers (2021-08-18T15:34:11Z)
Non-Adversarial Video Synthesis with Learned Priors [53.26777815740381]
We focus on the problem of generating videos from latent noise vectors, without any reference input frames. We develop a novel approach that jointly optimize the input latent space, the weights of a recurrent neural network and a generator through non-adversarial learning. Our approach generates superior quality videos compared to the existing state-of-the-art methods.
arXiv Detail & Related papers (2020-03-21T02:57:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.