Related papers: HiStream: Efficient High-Resolution Video Generation via Redundancy-Eliminated Streaming

HiStream: Efficient High-Resolution Video Generation via Redundancy-Eliminated Streaming

URL: http://arxiv.org/abs/2512.21338v2
Date: Thu, 25 Dec 2025 14:18:18 GMT
Title: HiStream: Efficient High-Resolution Video Generation via Redundancy-Eliminated Streaming
Authors: Haonan Qiu, Shikun Liu, Zijian Zhou, Zhaochong An, Weiming Ren, Zhiheng Liu, Jonas Schult, Sen He, Shoufa Chen, Yuren Cong, Tao Xiang, Ziwei Liu, Juan-Manuel Perez-Rua,
Abstract summary: HiStream is an efficient autoregressive framework that systematically reduces redundancy across three axes.<n>On 1080p benchmarks, our primary HiStream model (i+ii) achieves state-of-the-art visual quality while demonstrating up to 76.2x faster denoising compared to the Wan2.1 baseline.<n>Our faster variant, HiStream+, applies all three optimizations, achieving a 107.5x acceleration over the baseline.
Score: 58.55148690302855
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: High-resolution video generation, while crucial for digital media and film, is computationally bottlenecked by the quadratic complexity of diffusion models, making practical inference infeasible. To address this, we introduce HiStream, an efficient autoregressive framework that systematically reduces redundancy across three axes: i) Spatial Compression: denoising at low resolution before refining at high resolution with cached features; ii) Temporal Compression: a chunk-by-chunk strategy with a fixed-size anchor cache, ensuring stable inference speed; and iii) Timestep Compression: applying fewer denoising steps to subsequent, cache-conditioned chunks. On 1080p benchmarks, our primary HiStream model (i+ii) achieves state-of-the-art visual quality while demonstrating up to 76.2x faster denoising compared to the Wan2.1 baseline and negligible quality loss. Our faster variant, HiStream+, applies all three optimizations (i+ii+iii), achieving a 107.5x acceleration over the baseline, offering a compelling trade-off between speed and quality, thereby making high-resolution video generation both practical and scalable.

Related papers

TeCoNeRV: Leveraging Temporal Coherence for Compressible Neural Representations for Videos [51.99176811574457]
Implicit Neural Representations (INRs) have recently demonstrated impressive performance for video compression.<n>However, scaling to high-resolution videos while maintaining encoding efficiency remains a significant challenge.<n>We address these fundamental limitations through three key contributions.<n>We are the first hypernetwork approach to demonstrate results at 480p, 720p and 1080p on UVG, HEVC and MCL-JCV.
arXiv Detail & Related papers (2026-02-18T18:59:55Z)
Less is Enough: Training-Free Video Diffusion Acceleration via Runtime-Adaptive Caching [57.7533917467934]
EasyCache is a training-free acceleration framework for video diffusion models.<n>We conduct comprehensive studies on various large-scale video generation models, including OpenSora, Wan2.1, and HunyuanVideo.<n>Our method achieves leading acceleration performance, reducing inference time by up to 2.1-3.3$times$ compared to the original baselines.
arXiv Detail & Related papers (2025-07-03T17:59:54Z)
StableCodec: Taming One-Step Diffusion for Extreme Image Compression [19.69733852050049]
Diffusion-based image compression has shown remarkable potential for achieving ultra-low coding (less than 0.05 bits per pixel) with high realism.<n>Current approaches require a large number of denoising steps at the decoder to generate realistic results under extreme constraints.<n>We introduce StableCodec, which enables one-step diffusion for high-fidelity and high-realism extreme image compression.
arXiv Detail & Related papers (2025-06-27T07:39:21Z)
Single-step Diffusion for Image Compression at Ultra-Low Bitrates [19.76457078979179]
We propose a single-step diffusion model for image compression that delivers high perceptual quality and fast decoding at ultra-lows.<n>Our approach incorporates two key innovations: (i) Vector-Quantized Residual (VQ-Residual) training, which factorizes a structural base code and a learned residual in latent space.<n>Ours achieves comparable compression performance to state-of-the-art methods while improving decoding speed by about 50x.
arXiv Detail & Related papers (2025-06-19T19:53:27Z)
H3AE: High Compression, High Speed, and High Quality AutoEncoder for Video Diffusion Models [97.45170082949552]
Autoencoder (AE) is the key to the success of latent diffusion models for image and video generation.<n>H3AE achieves ultra-high compression ratios and real-time decoding speed on GPU and mobile.
arXiv Detail & Related papers (2025-04-14T17:59:06Z)
FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality [58.80996741843102]
FasterCache is a training-free strategy designed to accelerate the inference of video diffusion models with high-quality generation.<n>We show that FasterCache can significantly accelerate video generation while keeping video quality comparable to the baseline.
arXiv Detail & Related papers (2024-10-25T07:24:38Z)
Differentiable Resolution Compression and Alignment for Efficient Video Classification and Retrieval [16.497758750494537]
We propose an efficient video representation network with Differentiable Resolution Compression and Alignment mechanism. We leverage a Differentiable Context-aware Compression Module to encode the saliency and non-saliency frame features. We introduce a new Resolution-Align Transformer Layer to capture global temporal correlations among frame features with different resolutions.
arXiv Detail & Related papers (2023-09-15T05:31:53Z)
Fast-Vid2Vid: Spatial-Temporal Compression for Video-to-Video Synthesis [40.249030338644225]
Video-to-Video synthesis (Vid2Vid) has achieved remarkable results in generating a photo-realistic video from a sequence of semantic maps. Fast-Vid2Vid achieves around real-time performance as 20 FPS and saves around 8x computational cost on a single V100 GPU.
arXiv Detail & Related papers (2022-07-11T17:57:57Z)
COMISR: Compression-Informed Video Super-Resolution [76.94152284740858]
Most videos on the web or mobile devices are compressed, and the compression can be severe when the bandwidth is limited. We propose a new compression-informed video super-resolution model to restore high-resolution content without introducing artifacts caused by compression.
arXiv Detail & Related papers (2021-05-04T01:24:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.