GSVR: 2D Gaussian-based Video Representation for 800+ FPS with Hybrid Deformation Field
- URL: http://arxiv.org/abs/2507.05594v1
- Date: Tue, 08 Jul 2025 02:13:12 GMT
- Title: GSVR: 2D Gaussian-based Video Representation for 800+ FPS with Hybrid Deformation Field
- Authors: Zhizhuo Pang, Zhihui Ke, Xiaobo Zhou, Tie Qiu,
- Abstract summary: Implicit neural representations for video have been recognized as a novel and promising video representation.<n>We propose GSVR, a novel 2D Gaussian-based video representation, which achieves 800+ FPS and 35+ PSNR on Bunny.<n>Our method converges much faster than existing methods and also has 10x faster decoding speed compared to other methods.
- Score: 7.977026024810772
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Implicit neural representations for video have been recognized as a novel and promising form of video representation. Existing works pay more attention to improving video reconstruction quality but little attention to the decoding speed. However, the high computation of convolutional network used in existing methods leads to low decoding speed. Moreover, these convolution-based video representation methods also suffer from long training time, about 14 seconds per frame to achieve 35+ PSNR on Bunny. To solve the above problems, we propose GSVR, a novel 2D Gaussian-based video representation, which achieves 800+ FPS and 35+ PSNR on Bunny, only needing a training time of $2$ seconds per frame. Specifically, we propose a hybrid deformation field to model the dynamics of the video, which combines two motion patterns, namely the tri-plane motion and the polynomial motion, to deal with the coupling of camera motion and object motion in the video. Furthermore, we propose a Dynamic-aware Time Slicing strategy to adaptively divide the video into multiple groups of pictures(GOP) based on the dynamic level of the video in order to handle large camera motion and non-rigid movements. Finally, we propose quantization-aware fine-tuning to avoid performance reduction after quantization and utilize image codecs to compress Gaussians to achieve a compact representation. Experiments on the Bunny and UVG datasets confirm that our method converges much faster than existing methods and also has 10x faster decoding speed compared to other methods. Our method has comparable performance in the video interpolation task to SOTA and attains better video compression performance than NeRV.
Related papers
- Efficient Neural Video Representation with Temporally Coherent Modulation [6.339750087526286]
Implicit neural representations (INR) has found successful applications across diverse domains.<n>We propose Neural Video representation with Temporally coherent Modulation (NVTM), a novel framework that can capture dynamic characteristics of video.<n>Our framework enables temporally temporally corresponding pixels at once, resulting in the fastest encoding speed for a reasonable video quality.
arXiv Detail & Related papers (2025-05-01T06:20:42Z) - D2GV: Deformable 2D Gaussian Splatting for Video Representation in 400FPS [22.373386953378002]
Implicit Representations (INRs) have emerged as a powerful approach for video representation, offering versatility across tasks such as compression and inpainting.<n>We propose a novel video representation based on deformable 2D Gaussian splatting, dubbed D2GV.<n>We demonstrate D2GV's versatility in tasks including video, inpainting and denoising, underscoring its potential as a promising solution for video representation.
arXiv Detail & Related papers (2025-03-07T17:26:27Z) - GaussianVideo: Efficient Video Representation and Compression by Gaussian Splatting [10.568851068989973]
Implicit Neural Representation for Videos (NeRV) has introduced a novel paradigm for video representation and compression.<n>We propose a new video representation and method based on 2D Gaussian Splatting to efficiently handle data handle.<n>Our method reduces memory usage by up to 78.4%, and significantly expedites video processing, achieving 5.5x faster training and 12.5x faster decoding.
arXiv Detail & Related papers (2025-03-06T11:31:08Z) - GSVC: Efficient Video Representation and Compression Through 2D Gaussian Splatting [3.479384894190067]
We propose GSVC, an approach to learning a set of 2D Gaussian splats that can effectively represent and compress video frames.<n>Experiment results show that GSVC achieves good rate-distortion trade-offs, comparable to state-of-the-art video codecs.
arXiv Detail & Related papers (2025-01-21T11:30:51Z) - Large Motion Video Autoencoding with Cross-modal Video VAE [52.13379965800485]
Video Variational Autoencoder (VAE) is essential for reducing video redundancy and facilitating efficient video generation.<n>Existing Video VAEs have begun to address temporal compression; however, they often suffer from inadequate reconstruction performance.<n>We present a novel and powerful video autoencoder capable of high-fidelity video encoding.
arXiv Detail & Related papers (2024-12-23T18:58:24Z) - VidTwin: Video VAE with Decoupled Structure and Dynamics [24.51768013474122]
VidTwin is a compact video autoencoder that decouples video into two distinct latent spaces.<n>Structure latent vectors capture overall content and global movement, and Dynamics latent vectors represent fine-grained details and rapid movements.<n>Experiments show that VidTwin achieves a high compression rate of 0.20% with high reconstruction quality.
arXiv Detail & Related papers (2024-12-23T17:16:58Z) - Fast Encoding and Decoding for Implicit Video Representation [88.43612845776265]
We introduce NeRV-Enc, a transformer-based hyper-network for fast encoding; and NeRV-Dec, a parallel decoder for efficient video loading.
NeRV-Enc achieves an impressive speed-up of $mathbf104times$ by eliminating gradient-based optimization.
NeRV-Dec simplifies video decoding, outperforming conventional codecs with a loading speed $mathbf11times$ faster.
arXiv Detail & Related papers (2024-09-28T18:21:52Z) - HNeRV: A Hybrid Neural Representation for Videos [56.492309149698606]
Implicit neural representations store videos as neural networks.
We propose a Hybrid Neural Representation for Videos (HNeRV)
With content-adaptive embeddings and re-designed architecture, HNeRV outperforms implicit methods in video regression tasks.
arXiv Detail & Related papers (2023-04-05T17:55:04Z) - Towards Scalable Neural Representation for Diverse Videos [68.73612099741956]
Implicit neural representations (INR) have gained increasing attention in representing 3D scenes and images.
Existing INR-based methods are limited to encoding a handful of short videos with redundant visual content.
This paper focuses on developing neural representations for encoding long and/or a large number of videos with diverse visual content.
arXiv Detail & Related papers (2023-03-24T16:32:19Z) - MagicVideo: Efficient Video Generation With Latent Diffusion Models [76.95903791630624]
We present an efficient text-to-video generation framework based on latent diffusion models, termed MagicVideo.
Due to a novel and efficient 3D U-Net design and modeling video distributions in a low-dimensional space, MagicVideo can synthesize video clips with 256x256 spatial resolution on a single GPU card.
We conduct extensive experiments and demonstrate that MagicVideo can generate high-quality video clips with either realistic or imaginary content.
arXiv Detail & Related papers (2022-11-20T16:40:31Z) - Scalable Neural Video Representations with Learnable Positional Features [73.51591757726493]
We show how to train neural representations with learnable positional features (NVP) that effectively amortize a video as latent codes.
We demonstrate the superiority of NVP on the popular UVG benchmark; compared with prior arts, NVP not only trains 2 times faster (less than 5 minutes) but also exceeds their encoding quality as 34.07rightarrow$34.57 (measured with the PSNR metric)
arXiv Detail & Related papers (2022-10-13T08:15:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.