Fugu-MT 論文翻訳(概要): MuKV: Multi-Grained KV Cache Compression for Long Streaming Video Question-Answering

論文の概要: MuKV: Multi-Grained KV Cache Compression for Long Streaming Video Question-Answering

arxiv url: http://arxiv.org/abs/2605.22269v1
Date: Thu, 21 May 2026 10:13:03 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-22 16:35:42.206062
Title: MuKV: Multi-Grained KV Cache Compression for Long Streaming Video Question-Answering
Title（参考訳）: MuKV:長時間ストリーミングビデオ質問応答のためのマルチグレードKVキャッシュ圧縮
Authors: Junbin Xiao, Jiajun Chen, Tianxiang Sun, Xun Yang, Angela Yao,
Abstract要約: KVキャッシュは、LLMプリフィルを介して歴史的なトークンのキーバリューを格納する。 MuKV は KV キャッシュ圧縮モジュールと半階層的検索手法を特徴とする手法である。長時間ストリーミングのVideoQAベンチマークの実験では、MKVはメモリとオンラインQA効率を犠牲にすることなく、回答の正確性を大幅に向上することが示された。
参考スコア（独自算出の注目度）: 75.0394545769057
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Long streaming video QA remains challenging due to growing visual tokens and limited reasoning length of large language models (LLMs). KV-caching stores the Key-Value (KV) of the historical tokens via LLM prefill and enables more efficient streaming QA. However, existing methods cache every one or two frames, causing redundant memory usage and losing fine-grained spatial details within frame or temporal contexts across frames. This paper proposes MuKV, a method that features a multi-grained KV cache compression module and a semi-hierarchical retrieval approach to improve both efficiency and accuracy for long streaming VideoQA. For the offline KV cache, MuKV extracts visual representations at patch-, frame-, and segment-levels. The multiple levels of granularity preserve both local cues and global temporal context, while maintaining efficiency with a dual signal token compression mechanism guided by self-attention and frequency. For online QA, MuKV designs a semi-hierarchical retrieval method to retrieve relevant KV caches for answer generation. Experiments on long-streaming VideoQA benchmarks show that MuKV significantly improves answer accuracy, without sacrificing memory and online QA efficiency. Moreover, our compression mechanism alone brings consistent benefits across answer accuracy, memory, and QA efficiency over baselines, showcasing highly effective contribution.
Abstract（参考訳）: 長いストリーミングビデオQAは、視覚トークンの増加と、大きな言語モデル(LLM)の推論長の制限により、依然として困難である。 KVキャッシュは、LLMプリフィルを介して歴史的なトークンのキーバリュー(KV)を格納し、より効率的なストリーミングQAを可能にする。しかし、既存のメソッドは1つか2つのフレームごとにキャッシュし、冗長なメモリ使用とフレーム内の細粒度の空間的詳細やフレーム間の時間的コンテキストを失う。本稿では,マルチ粒度KVキャッシュ圧縮モジュールと半階層的検索手法を組み合わせたMuKVを提案する。オフラインのKVキャッシュでは、MKVはパッチレベル、フレームレベル、セグメントレベルの視覚的表現を抽出する。複数のレベルの粒度は局所的手がかりと大域的時間的文脈の両方を保持し、自己注意と周波数で導かれる二重信号トークン圧縮機構で効率を保っている。オンラインQAにおいて、MKVは応答生成のための関連するKVキャッシュを検索する半階層的検索法を設計する。長時間ストリーミングのVideoQAベンチマークの実験では、MKVはメモリとオンラインQA効率を犠牲にすることなく、回答の正確性を大幅に向上することが示された。さらに、我々の圧縮メカニズムだけでは、解答精度、メモリ、QA効率がベースラインよりも一貫した利点をもたらし、非常に効果的な寄与を示している。

論文の概要: MuKV: Multi-Grained KV Cache Compression for Long Streaming Video Question-Answering

関連論文リスト