Fugu-MT 論文翻訳(概要): CodecFlow: Codec-Guided End-to-End Optimization for Streaming Video Analytics

論文の概要: CodecFlow: Codec-Guided End-to-End Optimization for Streaming Video Analytics

arxiv url: http://arxiv.org/abs/2604.06036v2
Date: Wed, 08 Apr 2026 07:19:13 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-09 14:06:05.171206
Title: CodecFlow: Codec-Guided End-to-End Optimization for Streaming Video Analytics
Title（参考訳）: CodecFlow: Codec-Guided End-to-End Optimization for Streaming Video Analytics
Authors: Yulin Zou, Yan Chen, Wenyan Chen, JooYoung Park, Shivaraman Nitin, Luo Tao, Francisco Romero, Dmitrii Ustiugov,
Abstract要約: CodecFlowは、ビデオコーデックが圧縮の副産物として各ストリームの時間的および空間的構造を抽出しているというキー観察に基づいて構築されたストリーミングビデオ分析システムである。実験の結果、CodecFlowは最大3倍のスループット向上と、最先端のベースラインよりも最大87%のGPU計算削減を実現している。
参考スコア（独自算出の注目度）: 4.835489391255295
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Video streaming analytics is a crucial workload for vision-language model serving, but the high cost of multimodal inference limits scalability. Prior systems reduce inference cost by exploiting temporal and spatial redundancy in video streams, but they target either the vision transformer (ViT) or the LLM with a limited view, leaving end-to-end opportunities untapped. Moreover, existing methods incur significant overhead to identify redundancy, either through offline profiling and training or costly online computation, making them ill-suited for dynamic real-time streams. We present CodecFlow, a codec-guided streaming video analytics system built on a key observation that video codecs already extract the temporal and spatial structure of each stream as a byproduct of compression. CodecFlow treats this codec metadata as a low-cost runtime signal to unify optimization across video decoding, visual processing, and LLM prefilling, with transmission reduction as an inherent benefit of operating directly on compressed bitstreams. This drives codec-guided patch pruning before ViT encoding and selective key-value cache refresh during LLM prefilling, both of which are fully online and do not require offline training. Experiments show that CodecFlow achieves up to 3x throughput improvement and up to 87% GPU compute reduction over state-of-the-art baselines, while maintaining competitive accuracy with only 0-8% F1 drop.
Abstract（参考訳）: ビデオストリーミング分析は、視覚言語モデル提供にとって重要な作業負荷であるが、マルチモーダル推論のコストが高いためスケーラビリティが制限される。従来のシステムは、ビデオストリームの時間的および空間的冗長性を利用して推論コストを削減するが、視覚変換器(ViT)またはLLMを限定的な視点でターゲットとし、エンドツーエンドの機会を未然に残す。さらに、既存の手法では、オフラインのプロファイリングとトレーニング、あるいはコストのかかるオンライン計算によって、冗長性を特定するためにかなりのオーバーヘッドが生じるため、動的リアルタイムストリームには適さない。ビデオコーデックが圧縮の副産物として各ストリームの時間的・空間的構造を抽出しているというキー観測に基づいて構築されたコーデック誘導型ストリーミングビデオ分析システムであるCodecFlowについて述べる。 CodecFlowは、このコーデックメタデータを低コストのランタイム信号として扱い、ビデオデコーディング、ビジュアル処理、LLMプリフィルの最適化を統一する。これにより、ViTエンコーディング前のコーデック誘導パッチプルーニングと、LLMプリフィル中の選択キー値キャッシュリフレッシュが実行され、どちらも完全にオンラインであり、オフライントレーニングを必要としない。実験によると、CodecFlowは最先端のベースラインよりも最大3倍のスループット向上と最大87%のGPU計算削減を実現し、競争精度は0-8%のF1ドロップで維持されている。

論文の概要: CodecFlow: Codec-Guided End-to-End Optimization for Streaming Video Analytics

関連論文リスト