Fugu-MT 論文翻訳(概要): Seeing Further and Wider: Joint Spatio-Temporal Enlargement for Micro-Video Popularity Prediction

論文の概要: Seeing Further and Wider: Joint Spatio-Temporal Enlargement for Micro-Video Popularity Prediction

arxiv url: http://arxiv.org/abs/2604.20311v2
Date: Thu, 23 Apr 2026 11:17:19 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-24 14:40:06.047834
Title: Seeing Further and Wider: Joint Spatio-Temporal Enlargement for Micro-Video Popularity Prediction
Title（参考訳）: 近視・広視:マイクロビデオ人気予測のための同時時空間拡大
Authors: Dali Wang, Yunyao Zhang, Junqing Yu, Yi-Ping Phoebe Chen, Chen Xu, Zikai Song,
Abstract要約: マイクロビデオの人気予測(MVPP)は、オンラインメディア上でのビデオの人気を予測することを目的としている。 MVPPアプローチでは、あるビデオ(時間)の時間的ダイナミクスと、他のビデオ(空間)に対する歴史的関連性の両方を理解することが重要である。
参考スコア（独自算出の注目度）: 24.227528430107114
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Micro-video popularity prediction (MVPP) aims to forecast the future popularity of videos on online media, which is essential for applications such as content recommendation and traffic allocation. In real-world scenarios, it is critical for MVPP approaches to understand both the temporal dynamics of a given video (temporal) and its historical relevance to other videos (spatial). However, existing approaches sufer from limitations in both dimensions: temporally, they rely on sparse short-range sampling that restricts content perception; spatially, they depend on flat retrieval memory with limited capacity and low efficiency, hindering scalable knowledge utilization. To overcome these limitations, we propose a unified framework that achieves joint spatio-temporal enlargement, enabling precise perception of extremely long video sequences while supporting a scalable memory bank that can infinitely expand to incorporate all relevant historical videos. Technically, we employ a Temporal Enlargement driven by a frame scoring module that extracts highlight cues from video frames through two complementary pathways: sparse sampling and dense perception. Their outputs are adaptively fused to enable robust long-sequence content understanding. For Spatial Enlargement, we construct a Topology-Aware Memory Bank that hierarchically clusters historically relevant content based on topological relationships. Instead of directly expanding memory capacity, we update the encoder features of the corresponding clusters when incorporating new videos, enabling unbounded historical association without unbounded storage growth. Extensive experiments on three widely used MVPP benchmarks demonstrate that our method consistently outperforms 11 strong baselines across mainstream metrics, achieving robust improvements in both prediction accuracy and ranking consistency.
Abstract（参考訳）: マイクロビデオの人気予測(MVPP)は、コンテンツレコメンデーションやトラフィックアロケーションといったアプリケーションに欠かせない、オンラインメディア上のビデオの今後の人気を予測することを目的としている。現実世界のシナリオでは、MVPPアプローチが与えられたビデオ(時間)の時間的ダイナミクスと、他のビデオ(空間)に対する歴史的関連性の両方を理解することが重要である。しかし、既存のアプローチは、両方の次元の制限から推測される: 時間的に、コンテンツ知覚を制限するスパース短距離サンプリングに依存し、空間的には、限られた容量と低い効率でフラットな検索メモリに依存し、スケーラブルな知識利用を妨げる。これらの制限を克服するため,我々は,拡張可能な拡張可能な拡張性のあるメモリバンクをサポートしながら,非常に長いビデオシーケンスの正確な認識を可能にする,共同時空間拡大を実現する統一的なフレームワークを提案する。技術的には、フレームスコアリングモジュールによって駆動されるテンポラルエンゲージメントを用いて、スパースサンプリングと密度知覚という2つの相補的な経路を通して、ビデオフレームからハイライトキューを抽出する。それらの出力は適応的に融合され、堅牢なロングシーケンスコンテンツ理解を可能にする。空間拡大のために、トポロジカルな関係に基づいて歴史的に関連するコンテンツを階層的にクラスタリングするトポロジカル・アウェア・メモリバンクを構築した。メモリ容量を直接拡張する代わりに、新しいビデオを導入する際に対応するクラスタのエンコーダ機能を更新する。広範に使用されている3つのMVPPベンチマークの大規模な実験により、我々の手法は主流のメトリクスにわたって11の強いベースラインを一貫して上回り、予測精度とランキング整合性の両方において堅牢な改善が達成された。

論文の概要: Seeing Further and Wider: Joint Spatio-Temporal Enlargement for Micro-Video Popularity Prediction

関連論文リスト