Fugu-MT 論文翻訳(概要): SwiftVR: Real-Time One-Step Generative Video Restoration

論文の概要: SwiftVR: Real-Time One-Step Generative Video Restoration

arxiv url: http://arxiv.org/abs/2606.09516v1
Date: Mon, 08 Jun 2026 14:07:29 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-09 14:42:07.17473
Title: SwiftVR: Real-Time One-Step Generative Video Restoration
Title（参考訳）: SwiftVR: リアルタイムのワンステップビデオ再生
Authors: Jiaqi Yan, Xiangyu Chen, Xinlin Zhong, Haibin Huang, Chi Zhang, Jie Liu, Jiantao Zhou, Xuelong Li,
Abstract要約: ライブストリームのためのリアルタイムビデオ復元(VR)には、フレーム毎のレイテンシの厳しい制約の下で高解像度の出力が必要となる。我々は、因果的チャンクワイズプロトコルの下で両方のボトルネックを削減するストリーミングワンステップ生成VRフレームワークであるSwiftVRを紹介します。 SwiftVRは、コンシューマグレードのGPUでリアルタイム1080pストリーミングを実現する最初の生成VRモデルである。
参考スコア（独自算出の注目度）: 58.20544992792176
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Real-time video restoration (VR) for live streams requires high-resolution outputs under strict per-frame latency constraints. Existing one-step diffusion-based VR models remain difficult to deploy on consumer-grade GPUs due to two main bottlenecks: quadratic spatial attention at high resolutions and the latency-memory overhead of large video autoencoders. We present SwiftVR, a streaming one-step generative VR framework that reduces both bottlenecks under a causal chunk-wise protocol. For attention, mask-free shifted-window self-attention gathers each spatial window into a dense tensor via deterministic indexing, keeping all attention calls on the dense scaled dot-product attention path without masks, cyclic shifts, padding, or hardware-specific sparse kernels. Because SwiftVR uses only standard dense SDPA calls, the trained model transfers to consumer GPUs without retraining or custom kernels. For autoencoding, a lightweight Restoration-aware Autoencoder enables fast chunk-wise decoding while preserving reconstruction quality. On a single H100, SwiftVR sustains 31~FPS at 2560x1440 and 14~FPS at 3840x2160, whereas all compared diffusion-based VR baselines exceed the memory limit at 4K. On a consumer RTX~5090, SwiftVR reaches 26~FPS at 1920x1080. To our knowledge, SwiftVR is the first generative VR model to achieve real-time 1080p streaming on a consumer-grade GPU, while attaining strong no-reference perceptual quality with lower inference cost. Project is available at https://h-oliday.github.io/SwiftVR.
Abstract（参考訳）: ライブストリームのためのリアルタイムビデオ復元(VR)には、フレーム毎のレイテンシの厳しい制約の下で高解像度の出力が必要となる。既存の1ステップの拡散ベースのVRモデルは、高解像度での二次的空間的注意と大きなビデオオートエンコーダの遅延メモリオーバーヘッドという2つのボトルネックのために、コンシューマグレードのGPUにデプロイすることは依然として困難である。我々は、因果的チャンクワイズプロトコルの下で両方のボトルネックを削減するストリーミングワンステップ生成VRフレームワークであるSwiftVRを紹介します。注意のために、マスクのないシフトウインドウ自己注意は、各空間ウィンドウを決定論的指数付けによって高密度テンソルに集め、マスク、サイクリックシフト、パディング、ハードウェア固有のスパースカーネルを使わずに、高密度のドット製品注意経路にすべての注意を呼びかける。 SwiftVRは標準のSDPAコールのみを使用するため、トレーニングされたモデルは、リトレーニングやカスタムカーネルなしで、コンシューマGPUに転送される。オートエンコーディングでは、軽量な復元対応オートエンコーダが、復元品質を維持しながら高速なチャンクワイズデコーディングを可能にする。単一のH100では、SwiftVRは2560x1440で31〜FPS、3840x2160で14〜FPSを維持できる。消費者向けRTX~5090では、SwiftVRは1920x1080で26~FPSに達する。私たちの知る限り、SwiftVRはコンシューマグレードのGPU上でリアルタイム1080pストリーミングを実現する最初の生成型VRモデルです。プロジェクトはhttps://h-oliday.github.io/SwiftVRで入手できる。

論文の概要: SwiftVR: Real-Time One-Step Generative Video Restoration

関連論文リスト