Fugu-MT 論文翻訳(概要): StreamingEffect: Real-Time Human-Centric Video Effect Generation

論文の概要: StreamingEffect: Real-Time Human-Centric Video Effect Generation

arxiv url: http://arxiv.org/abs/2605.17019v1
Date: Sat, 16 May 2026 14:45:32 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-19 17:57:47.494332
Title: StreamingEffect: Real-Time Human-Centric Video Effect Generation
Title（参考訳）: StreamingEffect: リアルタイムな人間中心ビデオエフェクト生成
Authors: Yiren Song, Cheng Liu, Yuxin Jiang, Mike Zheng Shou,
Abstract要約: textbfStreamingEffectは、リアルタイムな人間中心のストリーミングビデオエフェクトフレームワークである。提案手法は,H200 GPUでリアルタイムで高品質な720pビデオ編集を可能にする。
参考スコア（独自算出の注目度）: 63.354447770285894
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Streaming video effect generation is highly desirable for live human-centric applications such as e-commerce streaming, entertainment, and vlogging, yet remains difficult due to the lack of suitable data and deployable editing models. Unlike generic video generation, this task requires real-time video-to-video editing that adds expressive effects while preserving human identity, background content, and temporal consistency. Existing acceleration efforts mainly focus on text-to-video generation, while efficient distillation for video editing remains largely underexplored. In this paper, we present \textbf{StreamingEffect}, a real-time human-centric streaming video effect framework. We adopt an in-context video editing architecture and train a high-quality bidirectional teacher, then distill it into a causal autoregressive student and further reduce sampling from 50 steps to 4 steps. We also introduce keyframe control, allowing reference effect frames to be injected online and propagated through the stream for interactive editing. To address the data bottleneck, we construct \textbf{VideoEffect-130K}, to our knowledge the largest human-centric video effect dataset, containing 70K effect videos and 60K editing videos across 600 effect categories curated from short-video and editing platforms. Experiments show that our method enables real-time, high-quality 720p video editing on a single H200 GPU.
Abstract（参考訳）: ストリーミングビデオエフェクト生成は、eコマースストリーミング、エンターテイメント、ブログなど、人間中心のライブアプリケーションにとって非常に望ましいが、適切なデータやデプロイ可能な編集モデルがないため、依然として困難である。一般的なビデオ生成とは異なり、このタスクには、人間のアイデンティティ、バックグラウンドコンテンツ、時間的一貫性を保ちながら、表現的な効果を付加するリアルタイムビデオ編集が必要である。既存のアクセラレーションの取り組みは主にテキスト・ビデオ生成に重点を置いているが、ビデオ編集のための効率的な蒸留はいまだに未検討である。本稿では,リアルタイムな人間中心のストリーミング・エフェクト・フレームワークである「textbf{StreamingEffect}」を提案する。我々は、文脈内ビデオ編集アーキテクチャを採用し、高品質な双方向教師を訓練し、それを因果的自己回帰的な学生に蒸留し、さらに50ステップから4ステップのサンプリングを減らす。また、キーフレーム制御を導入し、参照エフェクトフレームをオンラインに注入し、ストリームを通じて伝播してインタラクティブな編集を行う。データボトルネックに対処するため、短ビデオと編集プラットフォームから収集した600のエフェクトカテゴリにわたる70Kエフェクトビデオと60Kエフェクトビデオを含む、最大の人間中心のビデオエフェクトデータセットを知識として、‘textbf{VideoEffect-130K} を構築した。実験により,H200 GPUでリアルタイムで高品質な720pビデオ編集が可能であることが確認された。

論文の概要: StreamingEffect: Real-Time Human-Centric Video Effect Generation

関連論文リスト