GaussianVideo: Efficient Video Representation and Compression by Gaussian Splatting
- URL: http://arxiv.org/abs/2503.04333v1
- Date: Thu, 06 Mar 2025 11:31:08 GMT
- Title: GaussianVideo: Efficient Video Representation and Compression by Gaussian Splatting
- Authors: Inseo Lee, Youngyoon Choi, Joonseok Lee,
- Abstract summary: Implicit Neural Representation for Videos (NeRV) has introduced a novel paradigm for video representation and compression.<n>We propose a new video representation and method based on 2D Gaussian Splatting to efficiently handle data handle.<n>Our method reduces memory usage by up to 78.4%, and significantly expedites video processing, achieving 5.5x faster training and 12.5x faster decoding.
- Score: 10.568851068989973
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Implicit Neural Representation for Videos (NeRV) has introduced a novel paradigm for video representation and compression, outperforming traditional codecs. As model size grows, however, slow encoding and decoding speed and high memory consumption hinder its application in practice. To address these limitations, we propose a new video representation and compression method based on 2D Gaussian Splatting to efficiently handle video data. Our proposed deformable 2D Gaussian Splatting dynamically adapts the transformation of 2D Gaussians at each frame, significantly reducing memory cost. Equipped with a multi-plane-based spatiotemporal encoder and a lightweight decoder, it predicts changes in color, coordinates, and shape of initialized Gaussians, given the time step. By leveraging temporal gradients, our model effectively captures temporal redundancy at negligible cost, significantly enhancing video representation efficiency. Our method reduces GPU memory usage by up to 78.4%, and significantly expedites video processing, achieving 5.5x faster training and 12.5x faster decoding compared to the state-of-the-art NeRV methods.
Related papers
- Efficient Neural Video Representation with Temporally Coherent Modulation [6.339750087526286]
Implicit neural representations (INR) has found successful applications across diverse domains.
We propose Neural Video representation with Temporally coherent Modulation (NVTM), a novel framework that can capture dynamic characteristics of video.
Our framework enables temporally temporally corresponding pixels at once, resulting in the fastest encoding speed for a reasonable video quality.
arXiv Detail & Related papers (2025-05-01T06:20:42Z) - 4DGC: Rate-Aware 4D Gaussian Compression for Efficient Streamable Free-Viewpoint Video [56.04182926886754]
3D Gaussian Splatting (3DGS) has substantial potential for enabling photorealistic Free-Viewpoint Video (FVV) experiences.
Existing methods typically handle dynamic 3DGS representation and compression separately, motion information and the rate-distortion trade-off during training.
We propose 4DGC, a rate-aware 4D Gaussian compression framework that significantly reduces storage size while maintaining superior RD performance for FVV.
arXiv Detail & Related papers (2025-03-24T08:05:27Z) - Toward Lightweight and Fast Decoders for Diffusion Models in Image and Video Generation [0.0]
Large Variational Autoencoder decoders can slow down generation and consume considerable GPU memory.
We propose custom-trained decoders using lightweight Vision Transformer and Taming Transformer architectures.
Experiments show up to 15% overall speed-ups for image generation on COCO 2017 and up to 20 times faster decoding in the sub-module, with additional gains on UCF-101 for video tasks.
arXiv Detail & Related papers (2025-03-06T16:21:49Z) - GSVC: Efficient Video Representation and Compression Through 2D Gaussian Splatting [3.479384894190067]
We propose GSVC, an approach to learning a set of 2D Gaussian splats that can effectively represent and compress video frames.<n>Experiment results show that GSVC achieves good rate-distortion trade-offs, comparable to state-of-the-art video codecs.
arXiv Detail & Related papers (2025-01-21T11:30:51Z) - Representing Long Volumetric Video with Temporal Gaussian Hierarchy [80.51373034419379]
This paper aims to address the challenge of reconstructing long volumetric videos from multi-view RGB videos.<n>We propose a novel 4D representation, named Temporal Gaussian Hierarchy, to compactly model long volumetric videos.<n>This work is the first approach capable of efficiently handling minutes of volumetric video data while maintaining state-of-the-art rendering quality.
arXiv Detail & Related papers (2024-12-12T18:59:34Z) - SparseTem: Boosting the Efficiency of CNN-Based Video Encoders by Exploiting Temporal Continuity [15.872209884833977]
We propose a memory-efficient scheduling method to eliminate memory overhead and an online adjustment mechanism to minimize accuracy degradation.
SparseTem achieves speedup of 1.79x for EfficientDet and 4.72x for CRNN, with minimal accuracy drop and no additional memory overhead.
arXiv Detail & Related papers (2024-10-28T07:13:25Z) - MEGA: Memory-Efficient 4D Gaussian Splatting for Dynamic Scenes [49.36091070642661]
This paper introduces a memory-efficient framework for 4DGS.
It achieves a storage reduction by approximately 190$times$ and 125$times$ on the Technicolor and Neural 3D Video datasets.
It maintains comparable rendering speeds and scene representation quality, setting a new standard in the field.
arXiv Detail & Related papers (2024-10-17T14:47:08Z) - Fast Encoding and Decoding for Implicit Video Representation [88.43612845776265]
We introduce NeRV-Enc, a transformer-based hyper-network for fast encoding; and NeRV-Dec, a parallel decoder for efficient video loading.
NeRV-Enc achieves an impressive speed-up of $mathbf104times$ by eliminating gradient-based optimization.
NeRV-Dec simplifies video decoding, outperforming conventional codecs with a loading speed $mathbf11times$ faster.
arXiv Detail & Related papers (2024-09-28T18:21:52Z) - GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Splatting [27.33121386538575]
Implicit neural representations (INRs) recently achieved great success in image representation and compression.
However, this requirement often hinders their use on low-end devices with limited memory.
We propose a groundbreaking paradigm of image representation and compression by 2D Gaussian Splatting, named GaussianImage.
arXiv Detail & Related papers (2024-03-13T14:02:54Z) - A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames [57.758863967770594]
We build on the common paradigm of transferring large-scale, image--text models to video via shallow temporal fusion.
We expose two limitations to the approach: (1) decreased spatial capabilities, likely due to poor video--language alignment in standard video datasets, and (2) higher memory consumption, bottlenecking the number of frames that can be processed.
arXiv Detail & Related papers (2023-12-12T16:10:19Z) - EAGLES: Efficient Accelerated 3D Gaussians with Lightweight EncodingS [40.94643885302646]
3D Gaussian splatting (3D-GS) has gained popularity in novel-view scene synthesis.
It addresses the challenges of lengthy training times and slow rendering speeds associated with Radiance Neural Fields (NeRFs)
We present a technique utilizing quantized embeddings to significantly reduce per-point memory storage requirements.
arXiv Detail & Related papers (2023-12-07T18:59:55Z) - MagicVideo: Efficient Video Generation With Latent Diffusion Models [76.95903791630624]
We present an efficient text-to-video generation framework based on latent diffusion models, termed MagicVideo.
Due to a novel and efficient 3D U-Net design and modeling video distributions in a low-dimensional space, MagicVideo can synthesize video clips with 256x256 spatial resolution on a single GPU card.
We conduct extensive experiments and demonstrate that MagicVideo can generate high-quality video clips with either realistic or imaginary content.
arXiv Detail & Related papers (2022-11-20T16:40:31Z) - Content Adaptive and Error Propagation Aware Deep Video Compression [110.31693187153084]
We propose a content adaptive and error propagation aware video compression system.
Our method employs a joint training strategy by considering the compression performance of multiple consecutive frames instead of a single frame.
Instead of using the hand-crafted coding modes in the traditional compression systems, we design an online encoder updating scheme in our system.
arXiv Detail & Related papers (2020-03-25T09:04:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.