Fugu-MT 論文翻訳(概要): Ultra Flash: Scaling Real-Time Streaming Video Generation to High Resolutions

論文の概要: Ultra Flash: Scaling Real-Time Streaming Video Generation to High Resolutions

arxiv url: http://arxiv.org/abs/2606.09150v1
Date: Mon, 08 Jun 2026 07:45:03 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-09 14:42:06.810389
Title: Ultra Flash: Scaling Real-Time Streaming Video Generation to High Resolutions
Title（参考訳）: Ultra Flash: リアルタイムストリーミングビデオ生成を高解像度にスケールアップ
Authors: Luxury, Jie Huang, Zihao Fan, Xiaoxiao Ma, Yuming Li, Jun-hao Zhuang, Zeyue Xue, Siming Fu, Haoran Li, Mingchen Zhong, Guohui Zhang, Shichen Ma, Yijun Liu, Jiaqi Shi, Yanwen Ma, Yaofeng Su, Haoyu Wang, Yaowei Li, Songchun Zhang, Weiyang Jin, Yuxuan Bian, Shiyi Zhang, Haojun Xu, Shuai Lu, Xin Han, Wei Tang, Haoyang Huang, Nan Duan,
Abstract要約: Ultra Flashは、リアルタイムの高解像度ビデオ生成が可能なカスケードストリーミングフレームワークである。この結果から,Ultra Flashは最先端の視覚的品質と優れた効率を維持しつつ,高解像度のストリーミング映像を確実に生成できることが示唆された。
参考スコア（独自算出の注目度）: 69.0190486024094
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While recent autoregressive video diffusion models achieve remarkable streaming quality, they remain confined to low resolutions (e.g., 480P), leaving efficient, scalable, real-time high-resolution video generation a fundamental open challenge. To bridge this gap, we present Ultra Flash, a cascaded streaming framework capable of real-time high-resolution video generation. Ultra Flash achieves ~30 FPS at 1K resolution and ~18 FPS at 2K resolution on a single GPU through three key contributions: (1) an architecture-preserving T2V-to-TV2V super-resolution training paradigm coupled with an AIGC-oriented data degradation pipeline that effectively preserves the generative capability of the base model, enabling enhanced high-resolution detail when cascaded after mainstream low-resolution generative models; (2) a causal streaming latent upsampler paired with a high-resolution decoder, which enhances spatiotemporal coherence while enabling efficient latent spatial scaling and precise high-resolution decoding with negligible computational overhead; and (3) a cascade high-resolution streaming video generation optimization scheme that first performs hybrid-reward-enhanced sparse causalization and single-step distillation of the super-resolution model, then introduces cascaded streaming self-forcing preference optimization with dynamic cache management, jointly enhancing overall coherence, improving quality, and enabling real-time high-resolution streaming video generation. Extensive experiments demonstrate that Ultra Flash reliably produces ultra-high-resolution streaming video while maintaining state-of-the-art visual quality and superior efficiency.
Abstract（参考訳）: 最近の自己回帰的ビデオ拡散モデルはストリーミングの品質を著しく向上させるが、低解像度(例:480P)に留まり、効率的でスケーラブルでリアルタイムな高解像度ビデオ生成は根本的なオープンな課題である。このギャップを埋めるために、リアルタイムの高解像度ビデオ生成が可能なカスケードストリーミングフレームワークであるUltra Flashを紹介します。アーキテクチャ保存型T2V-to-TV2V超分解能トレーニングパラダイムとAIGC指向のデータ分解パイプラインを組み合わせることで、ベースモデルの生成能力を効果的に維持し、主流の低分解能生成モデルの後にカスケードされた場合の高分解能ディテールを向上する。広汎な実験により、Ultra Flashは最先端の視覚的品質と優れた効率を維持しつつ、高解像度のストリーミングビデオを確実に生成することを示した。

論文の概要: Ultra Flash: Scaling Real-Time Streaming Video Generation to High Resolutions

関連論文リスト