Tokenizing Motion: A Generative Approach for Scene Dynamics Compression
- URL: http://arxiv.org/abs/2410.09768v2
- Date: Sun, 12 Oct 2025 07:29:43 GMT
- Title: Tokenizing Motion: A Generative Approach for Scene Dynamics Compression
- Authors: Shanzhi Yin, Zihan Zhang, Bolin Chen, Shiqi Wang, Yan Ye,
- Abstract summary: This paper proposes a novel generative video compression framework that leverages motion pattern priors.<n>These compact motion priors enable a new approach to ultralow content communication.<n>The proposed method can achieve superior rate-distortion-performance and outperform conventional scene-video Enhanced Compression Model.
- Score: 27.897703419056253
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This paper proposes a novel generative video compression framework that leverages motion pattern priors, derived from subtle dynamics in common scenes (e.g., swaying flowers or a boat drifting on water), rather than relying on video content priors (e.g., talking faces or human bodies). These compact motion priors enable a new approach to ultra-low bitrate communication while achieving high-quality reconstruction across diverse scene contents. At the encoder side, motion priors can be streamlined into compact representations via a dense-to-sparse transformation. At the decoder side, these priors facilitate the reconstruction of scene dynamics using an advanced flow-driven diffusion model. Experimental results illustrate that the proposed method can achieve superior rate-distortion-performance and outperform the state-of-the-art conventional-video codec Enhanced Compression Model (ECM) on-scene dynamics sequences. The project page can be found at-https://github.com/xyzysz/GNVDC.
Related papers
- Low-Bitrate Video Compression through Semantic-Conditioned Diffusion [19.21409064179896]
We propose a severe failure that transmits only the most meaningful information while relying on generative detail for priors for priors.<n>A conditional video reconstructs high-quality, temporally coherent videos from semantic, appearance, and motion cues.
arXiv Detail & Related papers (2025-11-29T09:38:16Z) - MoAlign: Motion-Centric Representation Alignment for Video Diffusion Models [50.162882483045045]
We propose a motion-centric alignment framework that learns a disentangled motion subspace from a pretrained video encoder.<n>This subspace is optimized to predict ground-truth optical flow, ensuring it captures true motion dynamics.<n>We then align the latent features of a text-to-video diffusion model to this new subspace, enabling the generative model to internalize motion knowledge and generate more plausible videos.
arXiv Detail & Related papers (2025-10-21T19:05:23Z) - D-FCGS: Feedforward Compression of Dynamic Gaussian Splatting for Free-Viewpoint Videos [12.24209693552492]
Free-viewpoint video (FVV) enables immersive 3D experiences, but efficient compression of dynamic 3D representations remains a major challenge.<n>This paper presents Feedforward Compression of Dynamic Gaussian Splatting (D-FCGS), a novel feedforward framework for compressing temporally correlated Gaussian point cloud sequences.<n> Experiments show that it matches the rate-distortion performance of optimization-based methods, achieving over 40 times compression in under 2 seconds.
arXiv Detail & Related papers (2025-07-08T10:39:32Z) - Rethinking Generative Human Video Coding with Implicit Motion Transformation [9.85295369102017]
generative video could achieve promising compression performance by evolving high-dimensional signals into compact feature representations.<n>Human body videos pose greater challenges due to their more complex and diverse motion patterns.<n>We propose to characterize complex human body signal into compact visual features and transform these features into implicit motion guidance for signal reconstruction.
arXiv Detail & Related papers (2025-06-12T07:58:18Z) - Hi-VAE: Efficient Video Autoencoding with Global and Detailed Motion [23.80254637449824]
Hi-VAE formulates an efficient video autoencoding framework that encodes coarse-to-fine motion representations of video dynamics.<n>We show that Hi-VAE exhibits a high compression factor of 1428$times$, almost 30$times$ higher than baseline methods.
arXiv Detail & Related papers (2025-06-08T13:30:11Z) - Generative Human Video Compression with Multi-granularity Temporal Trajectory Factorization [13.341123726068652]
We propose a novel Multi-granularity Temporal Trajectory Factorization framework for generative human video compression.
Experimental results show that proposed method outperforms latest generative models and the state-of-the-art video coding standard Versatile Video Coding.
arXiv Detail & Related papers (2024-10-14T05:34:32Z) - VDG: Vision-Only Dynamic Gaussian for Driving Simulation [112.6139608504842]
We introduce self-supervised VO into our pose-free dynamic Gaussian method (VDG)
VDG can work with only RGB image input and construct dynamic scenes at a faster speed and larger scenes compared with the pose-free dynamic view-synthesis method.
Our results show favorable performance over the state-of-the-art dynamic view synthesis methods.
arXiv Detail & Related papers (2024-06-26T09:29:21Z) - MoDGS: Dynamic Gaussian Splatting from Casually-captured Monocular Videos with Depth Priors [65.31707882676292]
MoDGS is a new pipeline to render novel views of dynamic scenes from a casually captured monocular video.<n>Experiments demonstrate MoDGS is able to render high-quality novel view images of dynamic scenes from just a casually captured monocular video.
arXiv Detail & Related papers (2024-06-01T13:20:46Z) - Compression-Realized Deep Structural Network for Video Quality Enhancement [78.13020206633524]
This paper focuses on the task of quality enhancement for compressed videos.
Most of the existing methods lack a structured design to optimally leverage the priors within compression codecs.
A new paradigm is urgently needed for a more conscious'' process of quality enhancement.
arXiv Detail & Related papers (2024-05-10T09:18:17Z) - VMC: Video Motion Customization using Temporal Attention Adaption for
Text-to-Video Diffusion Models [58.93124686141781]
Video Motion Customization (VMC) is a novel one-shot tuning approach crafted to adapt temporal attention layers within video diffusion models.
Our approach introduces a novel motion distillation objective using residual vectors between consecutive frames as a motion reference.
We validate our method against state-of-the-art video generative models across diverse real-world motions and contexts.
arXiv Detail & Related papers (2023-12-01T06:50:11Z) - MoVideo: Motion-Aware Video Generation with Diffusion Models [97.03352319694795]
We propose a novel motion-aware generation (MoVideo) framework that takes motion into consideration from two aspects: video depth and optical flow.
MoVideo achieves state-of-the-art results in both text-to-video and image-to-video generation, showing promising prompt consistency, frame consistency and visual quality.
arXiv Detail & Related papers (2023-11-19T13:36:03Z) - StyleInV: A Temporal Style Modulated Inversion Network for Unconditional
Video Generation [73.54398908446906]
We introduce a novel motion generator design that uses a learning-based inversion network for GAN.
Our method supports style transfer with simple fine-tuning when the encoder is paired with a pretrained StyleGAN generator.
arXiv Detail & Related papers (2023-08-31T17:59:33Z) - VNVC: A Versatile Neural Video Coding Framework for Efficient
Human-Machine Vision [59.632286735304156]
It is more efficient to enhance/analyze the coded representations directly without decoding them into pixels.
We propose a versatile neural video coding (VNVC) framework, which targets learning compact representations to support both reconstruction and direct enhancement/analysis.
arXiv Detail & Related papers (2023-06-19T03:04:57Z) - LaMD: Latent Motion Diffusion for Video Generation [69.4111397077229]
latent motion diffusion (LaMD) framework consists of a motion-decomposed video autoencoder and a diffusion-based motion generator.
Results show that LaMD generates high-quality videos with a wide range of motions, from dynamics to highly controllable movements.
arXiv Detail & Related papers (2023-04-23T10:32:32Z) - Scene Matters: Model-based Deep Video Compression [13.329074811293292]
We propose a model-based video compression (MVC) framework that regards scenes as the fundamental units for video sequences.
Our proposed MVC directly models novel intensity variation of the entire video sequence in one scene, seeking non-redundant representations instead of reducing redundancy.
Our method achieves up to a 20% reduction compared to the latest video standard H.266 and is more efficient in decoding than existing video coding strategies.
arXiv Detail & Related papers (2023-03-08T13:15:19Z) - MotionVideoGAN: A Novel Video Generator Based on the Motion Space
Learned from Image Pairs [16.964371778504297]
We present MotionVideoGAN, a novel video generator synthesizing videos based on the motion space learned by pre-trained image pair generators.
Motion codes help us edit images within the motion space since the edited image shares the same contents with the other unchanged one in image pairs.
Our approach achieves state-of-the-art performance on the most complex video dataset ever used for unconditional video generation evaluation, UCF101.
arXiv Detail & Related papers (2023-03-06T05:52:13Z) - Learned Video Compression via Heterogeneous Deformable Compensation
Network [78.72508633457392]
We propose a learned video compression framework via heterogeneous deformable compensation strategy (HDCVC) to tackle the problems of unstable compression performance.
More specifically, the proposed algorithm extracts features from the two adjacent frames to estimate content-Neighborhood heterogeneous deformable (HetDeform) kernel offsets.
Experimental results indicate that HDCVC achieves superior performance than the recent state-of-the-art learned video compression approaches.
arXiv Detail & Related papers (2022-07-11T02:31:31Z) - Dilated convolutional neural network-based deep reference picture
generation for video compression [16.42377608366894]
We propose a deep reference picture generator which can create a picture that is more relevant to the current encoding frame.
Inspired by the recent progress of Convolutional Neural Network(CNN), this paper proposes to use a dilated CNN to build the generator.
arXiv Detail & Related papers (2022-02-11T09:06:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.