Related papers: SuperGen: An Efficient Ultra-high-resolution Video Generation System with Sketching and Tiling

SuperGen: An Efficient Ultra-high-resolution Video Generation System with Sketching and Tiling

URL: http://arxiv.org/abs/2508.17756v1
Date: Mon, 25 Aug 2025 07:49:17 GMT
Title: SuperGen: An Efficient Ultra-high-resolution Video Generation System with Sketching and Tiling
Authors: Fanjiang Ye, Zepeng Zhao, Yi Mu, Jucheng Shen, Renjie Li, Kaijian Wang, Desen Sun, Saurabh Agarwal, Myungjin Lee, Triston Cao, Aditya Akella, Arvind Krishnamurthy, T. S. Eugene Ng, Zhengzhong Tu, Yuke Wang,
Abstract summary: SuperGen is an efficient tile-based framework for ultra-high-resolution video generation.<n>It supports a wide range of resolutions without additional training efforts.<n>SuperGen incorporates a tile-tailored, adaptive, region-aware caching strategy.
Score: 27.96742776792205
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diffusion models have recently achieved remarkable success in generative tasks (e.g., image and video generation), and the demand for high-quality content (e.g., 2K/4K videos) is rapidly increasing across various domains. However, generating ultra-high-resolution videos on existing standard-resolution (e.g., 720p) platforms remains challenging due to the excessive re-training requirements and prohibitively high computational and memory costs. To this end, we introduce SuperGen, an efficient tile-based framework for ultra-high-resolution video generation. SuperGen features a novel training-free algorithmic innovation with tiling to successfully support a wide range of resolutions without additional training efforts while significantly reducing both memory footprint and computational complexity. Moreover, SuperGen incorporates a tile-tailored, adaptive, region-aware caching strategy that accelerates video generation by exploiting redundancy across denoising steps and spatial regions. SuperGen also integrates cache-guided, communication-minimized tile parallelism for enhanced throughput and minimized latency. Evaluations demonstrate that SuperGen harvests the maximum performance gains while achieving high output quality across various benchmarks.

Related papers

Quant VideoGen: Auto-Regressive Long Video Generation via 2-Bit KV-Cache Quantization [83.406036390582]
Quant VideoGen (QVG) is a training free KV cache quantization framework for autoregressive video diffusion models.<n>It reduces KV memory by up to 7.0 times with less than 4% end to end latency overhead.<n>It consistently outperforms existing baselines in generation quality.
arXiv Detail & Related papers (2026-02-03T00:54:32Z)
UltraGen: High-Resolution Video Generation with Hierarchical Attention [62.99161115650818]
UltraGen is a novel video generation framework that enables i) efficient and ii) end-to-end native high-resolution video synthesis.<n>We show that UltraGen can effectively scale pre-trained low-resolution video models to 1080P and even 4K resolution for the first time.
arXiv Detail & Related papers (2025-10-21T16:23:21Z)
CineScale: Free Lunch in High-Resolution Cinematic Visual Generation [42.81729840016782]
We propose CineScale, a novel inference paradigm to enable higher-resolution visual generation.<n>Our approach enables 8k image generation without any fine-tuning, and achieves 4k video generation with only minimal LoRA fine-tuning.
arXiv Detail & Related papers (2025-08-21T17:59:57Z)
Taming Diffusion Transformer for Real-Time Mobile Video Generation [72.20660234882594]
Diffusion Transformers (DiT) have shown strong performance in video generation tasks, but their high computational cost makes them impractical for resource-constrained devices like smartphones.<n>We propose a series of novel optimizations to significantly accelerate video generation and enable real-time performance on mobile platforms.
arXiv Detail & Related papers (2025-07-17T17:59:10Z)
Turbo2K: Towards Ultra-Efficient and High-Quality 2K Video Synthesis [50.77548592888096]
Demand for 2K video synthesis is rising with increasing consumer expectations for ultra-clear visuals.<n>Turbo2K is an efficient framework for generating detail-rich 2K videos.
arXiv Detail & Related papers (2025-04-20T03:30:59Z)
QuantCache: Adaptive Importance-Guided Quantization with Hierarchical Latent and Layer Caching for Video Generation [84.91431271257437]
Diffusion Transformers (DiTs) have emerged as a dominant architecture in video generation.<n>DiTs come with significant drawbacks, including increased computational and memory costs.<n>We propose QuantCache, a novel training-free inference acceleration framework.
arXiv Detail & Related papers (2025-03-09T10:31:51Z)
Fast and Memory-Efficient Video Diffusion Using Streamlined Inference [41.505829393818274]
Current video diffusion models exhibit demanding computational requirements and high peak memory usage. We present Streamlined Inference, which leverages the temporal and spatial properties of video diffusion models. Our approach significantly reduces peak memory and computational overhead, making it feasible to generate high-quality videos on a single consumer GPU.
arXiv Detail & Related papers (2024-11-02T07:52:18Z)
AIM 2024 Challenge on Efficient Video Super-Resolution for AV1 Compressed Content [56.552444900457395]
Video super-resolution (VSR) is a critical task for enhancing low-bitrate and low-resolution videos, particularly in streaming applications. In this work, we compile different methods to address these challenges, the solutions are end-to-end real-time video super-resolution frameworks. The proposed solutions tackle video up-scaling for two applications: 540p to 4K (x4) as a general case, and 360p to 1080p (x3) more tailored towards mobile devices.
arXiv Detail & Related papers (2024-09-25T18:12:19Z)
EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction [67.11722682878722]
This work presents EfficientViT, a new family of high-resolution vision models with novel multi-scale linear attention. Our multi-scale linear attention achieves the global receptive field and multi-scale learning. EfficientViT delivers remarkable performance gains over previous state-of-the-art models.
arXiv Detail & Related papers (2022-05-29T20:07:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.