StreamSTGS: Streaming Spatial and Temporal Gaussian Grids for Real-Time Free-Viewpoint Video
- URL: http://arxiv.org/abs/2511.06046v1
- Date: Sat, 08 Nov 2025 15:35:43 GMT
- Title: StreamSTGS: Streaming Spatial and Temporal Gaussian Grids for Real-Time Free-Viewpoint Video
- Authors: Zhihui Ke, Yuyang Liu, Xiaobo Zhou, Tie Qiu,
- Abstract summary: Streaming free-viewpoint video(FVV) in real-time faces significant challenges.<n>Recent 3DGS-based FVV methods have achieved notable breakthroughs in both training and rendering.<n>We propose a novel FVV representation, dubbed StreamSTGS, designed for real-time streaming.
- Score: 16.890908589888678
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Streaming free-viewpoint video~(FVV) in real-time still faces significant challenges, particularly in training, rendering, and transmission efficiency. Harnessing superior performance of 3D Gaussian Splatting~(3DGS), recent 3DGS-based FVV methods have achieved notable breakthroughs in both training and rendering. However, the storage requirements of these methods can reach up to $10$MB per frame, making stream FVV in real-time impossible. To address this problem, we propose a novel FVV representation, dubbed StreamSTGS, designed for real-time streaming. StreamSTGS represents a dynamic scene using canonical 3D Gaussians, temporal features, and a deformation field. For high compression efficiency, we encode canonical Gaussian attributes as 2D images and temporal features as a video. This design not only enables real-time streaming, but also inherently supports adaptive bitrate control based on network condition without any extra training. Moreover, we propose a sliding window scheme to aggregate adjacent temporal features to learn local motions, and then introduce a transformer-guided auxiliary training module to learn global motions. On diverse FVV benchmarks, StreamSTGS demonstrates competitive performance on all metrics compared to state-of-the-art methods. Notably, StreamSTGS increases the PSNR by an average of $1$dB while reducing the average frame size to just $170$KB. The code is publicly available on https://github.com/kkkzh/StreamSTGS.
Related papers
- LoD-Structured 3D Gaussian Splatting for Streaming Video Reconstruction [19.37120630668256]
Free-Viewpoint Video (FVV) reconstruction enables photorealistic and interactive 3D scene visualization.<n>Recent 3D Gaussian Splatting (3DGS) has advanced FVV due to its superior rendering speed.<n>We propose StreamLoD-GS, an LoD-based Gaussian Splatting framework designed specifically for SFVV.
arXiv Detail & Related papers (2026-01-26T13:27:46Z) - Speedy Deformable 3D Gaussian Splatting: Fast Rendering and Compression of Dynamic Scenes [57.69608119350651]
Recent extensions of 3D Gaussian Splatting (3DGS) to dynamic scenes achieve high-quality novel view synthesis by using neural networks to predict the time-varying deformation of each Gaussian.<n>However, performing per-Gaussian neural inference at every frame poses a significant bottleneck, limiting rendering speed and increasing memory and compute requirements.<n>We present Speedy Deformable 3D Gaussian Splatting (SpeeDe3DGS), a general pipeline for accelerating the rendering speed of dynamic 3DGS and 4DGS representations by reducing neural inference through two complementary techniques.
arXiv Detail & Related papers (2025-06-09T16:30:48Z) - Efficient Neural Video Representation with Temporally Coherent Modulation [6.339750087526286]
Implicit neural representations (INR) has found successful applications across diverse domains.<n>We propose Neural Video representation with Temporally coherent Modulation (NVTM), a novel framework that can capture dynamic characteristics of video.<n>Our framework enables temporally temporally corresponding pixels at once, resulting in the fastest encoding speed for a reasonable video quality.
arXiv Detail & Related papers (2025-05-01T06:20:42Z) - QUEEN: QUantized Efficient ENcoding of Dynamic Gaussians for Streaming Free-viewpoint Videos [42.554100586090826]
Online free-viewpoint video (FVV) streaming is a challenging problem, which is relatively under-explored.<n>We propose a novel framework for QUantized and Efficient ENcoding for streaming FVV using 3D Gaussianting.<n>We further propose a quantization-sparity framework, which contains a learned latent-decoder for effectively quantizing residuals other than Gaussian positions.
arXiv Detail & Related papers (2024-12-05T18:59:55Z) - Speedy-Splat: Fast 3D Gaussian Splatting with Sparse Pixels and Sparse Primitives [60.217580865237835]
3D Gaussian Splatting (3D-GS) is a recent 3D scene reconstruction technique that enables real-time rendering of novel views by modeling scenes as parametric point clouds of differentiable 3D Gaussians.<n>We identify and address two key inefficiencies in 3D-GS to substantially improve rendering speed.<n>Our Speedy-Splat approach combines these techniques to accelerate average rendering speed by a drastic $mathit6.71times$ across scenes from the Mip-NeRF 360, Tanks & Temples, and Deep Blending datasets.
arXiv Detail & Related papers (2024-11-30T20:25:56Z) - HiCoM: Hierarchical Coherent Motion for Streamable Dynamic Scene with 3D Gaussian Splatting [7.507657419706855]
This paper proposes an efficient framework, dubbed HiCoM, with three key components.<n>First, we construct a compact and robust initial 3DGS representation using a perturbation smoothing strategy.<n>Next, we introduce a Hierarchical Coherent Motion mechanism that leverages the inherent non-uniform distribution and local consistency of 3D Gaussians.<n>Experiments conducted on two widely used datasets show that our framework improves learning efficiency of the state-of-the-art methods by about $20%$.
arXiv Detail & Related papers (2024-11-12T04:40:27Z) - V^3: Viewing Volumetric Videos on Mobiles via Streamable 2D Dynamic Gaussians [53.614560799043545]
V3 (Viewing Volumetric Videos) is a novel approach that enables high-quality mobile rendering through the streaming of dynamic Gaussians.
Our key innovation is to view dynamic 3DGS as 2D videos, facilitating the use of hardware video codecs.
As the first to stream dynamic Gaussians on mobile devices, our companion player offers users an unprecedented volumetric video experience.
arXiv Detail & Related papers (2024-09-20T16:54:27Z) - 3DGStream: On-the-Fly Training of 3D Gaussians for Efficient Streaming of Photo-Realistic Free-Viewpoint Videos [10.323643152957114]
3DGStream is a method designed for efficient FVV streaming of real-world dynamic scenes.
Our method achieves fast on-the-fly per-frame reconstruction within 12 seconds and real-time rendering at 200 FPS.
arXiv Detail & Related papers (2024-03-03T08:42:40Z) - Scalable Neural Video Representations with Learnable Positional Features [73.51591757726493]
We show how to train neural representations with learnable positional features (NVP) that effectively amortize a video as latent codes.
We demonstrate the superiority of NVP on the popular UVG benchmark; compared with prior arts, NVP not only trains 2 times faster (less than 5 minutes) but also exceeds their encoding quality as 34.07rightarrow$34.57 (measured with the PSNR metric)
arXiv Detail & Related papers (2022-10-13T08:15:08Z) - A Real-time Action Representation with Temporal Encoding and Deep
Compression [115.3739774920845]
We propose a new real-time convolutional architecture, called Temporal Convolutional 3D Network (T-C3D), for action representation.
T-C3D learns video action representations in a hierarchical multi-granularity manner while obtaining a high process speed.
Our method achieves clear improvements on UCF101 action recognition benchmark against state-of-the-art real-time methods by 5.4% in terms of accuracy and 2 times faster in terms of inference speed with a less than 5MB storage model.
arXiv Detail & Related papers (2020-06-17T06:30:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.