Morphe: High-Fidelity Generative Video Streaming with Vision Foundation Model
- URL: http://arxiv.org/abs/2602.03529v1
- Date: Tue, 03 Feb 2026 13:47:18 GMT
- Title: Morphe: High-Fidelity Generative Video Streaming with Vision Foundation Model
- Authors: Tianyi Gong, Zijian Cao, Zixing Zhang, Jiangkai Wu, Xinggong Zhang, Shuguang Cui, Fangxin Wang,
- Abstract summary: Vision foundation model (VFM) is used to harness the powerful video understanding and processing capacities.<n>We present the first revolutionized paradigm that enables VFM-based end-to-end generative streaming.<n>Morphe achieves comparable visual quality while saving 62.5% bandwidth compared to H.265, and real-time, loss-resilient video delivery in challenging network environments.
- Score: 47.71265147565265
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video streaming is a fundamental Internet service, while the quality still cannot be guaranteed especially in poor network conditions such as bandwidth-constrained and remote areas. Existing works mainly work towards two directions: traditional pixel-codec streaming nearly approaches its limit and is hard to step further in compression; the emerging neural-enhanced or generative streaming usually fall short in latency and visual fidelity, hindering their practical deployment. Inspired by the recent success of vision foundation model (VFM), we strive to harness the powerful video understanding and processing capacities of VFM to achieve generalization, high fidelity and loss resilience for real-time video streaming with even higher compression rate. We present the first revolutionized paradigm that enables VFM-based end-to-end generative video streaming towards this goal. Specifically, Morphe employs joint training of visual tokenizers and variable-resolution spatiotemporal optimization under simulated network constraints. Additionally, a robust streaming system is constructed that leverages intelligent packet dropping to resist real-world network perturbations. Extensive evaluation demonstrates that Morphe achieves comparable visual quality while saving 62.5\% bandwidth compared to H.265, and accomplishes real-time, loss-resilient video delivery in challenging network environments, representing a milestone in VFM-enabled multimedia streaming solutions.
Related papers
- Single-step Diffusion-based Video Coding with Semantic-Temporal Guidance [24.88807532823577]
We propose S2VC, a Single-Step diffusion based Video Codec that integrates a conditional coding framework with an efficient single-step diffusion generator.<n>We show that S2VC delivers state-of-the-art perceptual quality with an average 52.73% saving over prior perceptual methods.
arXiv Detail & Related papers (2025-12-08T12:05:30Z) - Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation [69.57572900337176]
We introduce Reward Forcing, a novel framework for efficient streaming video generation.<n> EMA-Sink tokens capture both long-term context and recent dynamics, preventing initial frame copying.<n>Re-DMD biases the model's output distribution toward high-reward regions by prioritizing samples with greater dynamics rated by a vision-language model.
arXiv Detail & Related papers (2025-12-04T11:12:13Z) - InstantViR: Real-Time Video Inverse Problem Solver with Distilled Diffusion Prior [13.775331675468024]
We introduce InstantViR, an amortized inference framework for ultra-fast video reconstruction powered by a pre-trained video diffusion prior.<n>We show that InstantViR is compatible with real-time, interactive, editable, streaming scenarios, turning high-quality video restoration into a practical component of modern vision systems.
arXiv Detail & Related papers (2025-11-18T07:40:38Z) - StreamDiffusionV2: A Streaming System for Dynamic and Interactive Video Generation [65.90400162290057]
Generative models are reshaping the live-streaming industry by redefining how content is created, styled, and delivered.<n>Recent advances in video diffusion have markedly improved temporal consistency and sampling efficiency for offline generation.<n>Live online streaming operates under strict service-level objectives (SLOs): time-to-first-frame must be minimal, and every frame must meet a per-frame deadline with low jitter.
arXiv Detail & Related papers (2025-11-10T18:51:28Z) - VividFace: High-Quality and Efficient One-Step Diffusion For Video Face Enhancement [51.83206132052461]
Video Face Enhancement (VFE) seeks to reconstruct high-quality facial regions from degraded video sequences.<n>Current methods that rely on video super-resolution and generative frameworks face three fundamental challenges.<n>We propose VividFace, a novel and efficient one-step diffusion framework for video face enhancement.
arXiv Detail & Related papers (2025-09-28T02:39:48Z) - Plug-and-Play Versatile Compressed Video Enhancement [57.62582951699999]
Video compression effectively reduces the size of files, making it possible for real-time cloud computing.<n>However, it comes at the cost of visual quality, challenges the robustness of downstream vision models.<n>We present a versatile-aware enhancement framework that adaptively enhance videos under different compression settings.
arXiv Detail & Related papers (2025-04-21T18:39:31Z) - Semantic-Aware Adaptive Video Streaming Using Latent Diffusion Models for Wireless Networks [12.180483357502293]
This paper proposes a novel framework for real-time adaptivebitrate video streaming by integrating Latent Diffusion Models (LDMs) within the FF techniques.<n>The proposed approach leverages LDMs to compress I-frames into a latent space, offering significant storage and semantic transmission savings.<n>This work opens new possibilities for scalable real-time video streaming in 5G and future post-5G networks.
arXiv Detail & Related papers (2025-02-08T21:14:28Z) - BF-STVSR: B-Splines and Fourier-Best Friends for High Fidelity Spatial-Temporal Video Super-Resolution [14.082598088990352]
We propose BF-STVSR, a C-STVSR framework with two key modules tailored to better represent spatial and temporal characteristics of video.<n>Our approach achieves state-of-the-art in various metrics, including PSNR and SSIM, showing enhanced spatial details and natural temporal consistency.
arXiv Detail & Related papers (2025-01-19T13:29:41Z) - DeformStream: Deformation-based Adaptive Volumetric Video Streaming [4.366356163044466]
Volumetric video streaming offers immersive 3D experiences but faces significant challenges due to high bandwidth requirements and latency issues.
We introduce Deformation-based Adaptive Volumetric Video Streaming, a novel framework that enhances volumetric video streaming performance by leveraging the inherent deformability of mesh-based representations.
arXiv Detail & Related papers (2024-09-25T04:43:59Z) - Compression-Realized Deep Structural Network for Video Quality Enhancement [78.13020206633524]
This paper focuses on the task of quality enhancement for compressed videos.
Most of the existing methods lack a structured design to optimally leverage the priors within compression codecs.
A new paradigm is urgently needed for a more conscious'' process of quality enhancement.
arXiv Detail & Related papers (2024-05-10T09:18:17Z) - CaDM: Codec-aware Diffusion Modeling for Neural-enhanced Video Streaming [15.115975994657514]
We present Codec-aware Diffusion Modeling (CaDM), a novel Neural-enhanced Video Streaming (NVS) paradigm.
First, CaDM improves the encoder's compression efficiency by simultaneously reducing resolution and color bit-depth video frames.
arXiv Detail & Related papers (2022-11-15T05:14:48Z) - Multi-level Wavelet-based Generative Adversarial Network for Perceptual
Quality Enhancement of Compressed Video [51.631731922593225]
Existing methods mainly focus on enhancing the objective quality of compressed video while ignoring its perceptual quality.
We propose a novel generative adversarial network (GAN) based on multi-level wavelet packet transform (WPT) to enhance the perceptual quality of compressed video.
arXiv Detail & Related papers (2020-08-02T15:01:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.