Fugu-MT 論文翻訳(概要): Bridging Short Videos and Live Streams: Reasoning-Guided Multimodal LLMs for Cross-Domain Representation Learning

論文の概要: Bridging Short Videos and Live Streams: Reasoning-Guided Multimodal LLMs for Cross-Domain Representation Learning

arxiv url: http://arxiv.org/abs/2606.04448v1
Date: Wed, 03 Jun 2026 04:49:01 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-04 20:44:18.552469
Title: Bridging Short Videos and Live Streams: Reasoning-Guided Multimodal LLMs for Cross-Domain Representation Learning
Title（参考訳）: ショートビデオとライブストリームのブリッジ: クロスドメイン表現学習のための推論ガイド付きマルチモーダルLCM
Authors: Le Zhang, Xiaolan Zhu, Yuchen Wang, Shilong Kang, Jiaqi Xue, Xiaoyu Zhang, Xiang Chen, Yalong Guan, Xiangyu Wu, Shijun Wang, Lantao Hu, Kun Gai,
Abstract要約: Reasoning-Guided Cross-Domain Representation Learning (RGCD-Rep) RGCD-Repは、短いビデオからライブストリームへのクロスドメインレコメンデーションのための推論誘導フレームワークである。完全にデプロイされ、毎日4億人のユーザーが利用している。
参考スコア（独自算出の注目度）: 38.08336224801579
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As live streaming services grow, many platforms offer short videos and live streams to meet diverse needs. Short videos carry substantial traffic and rich behavior signals, whereas live streaming is a core conversion scenario with sparse behavior data, making cold start severe. Transferring user interests from short videos to live streaming recommendation can alleviate these issues. Meanwhile, short videos and live streams are complex multimodal items, and integrating multimodal signals improves recommendation performance. Although Multimodal Large Language Models (MLLMs) show strong multimodal understanding and reasoning, their application to cross-domain recommendation remains underexplored. To this end, we propose Reasoning-Guided Cross-Domain Representation Learning (RGCD-Rep), a reasoning-guided framework for cross-domain recommendation from short videos to live streams. RGCD-Rep introduces MLLM reasoning resource-efficiently and learns transferable item representations guided by behavioral collaboration via two-stage training. First, reasoning-aware distillation lets a frozen teacher MLLM generate structured cross-domain reasoning knowledge and distills it into a lightweight student MLLM. Second, transferability-guided cross-domain representation learning decomposes item representations into transferable and domain residual representations. The resulting representations are computed offline and integrated into downstream retrieval tasks, enabling low-cost industrial deployment. Extensive offline experiments demonstrate RGCD-Rep's superiority. After deployment in Kuaishou's live streaming recommendation system, A/B tests show significant gains across multiple core business metrics, confirming its effectiveness and practicality in real industrial scenarios. RGCD-Rep is fully deployed and serves over 400 million users daily.
Abstract（参考訳）: ライブストリーミングサービスが成長するにつれ、多くのプラットフォームが様々なニーズを満たすためのショートビデオとライブストリームを提供している。ショートビデオは、かなりのトラフィックとリッチな行動信号を持ち、ライブストリーミングはスパースな行動データを伴う中核的な変換シナリオであり、コールドスタートを厳しくする。短いビデオからライブストリーミングレコメンデーションへのユーザの関心を移すことは、これらの問題を緩和する。一方、ショートビデオとライブストリームは複雑なマルチモーダルアイテムであり、マルチモーダル信号を統合することでレコメンデーション性能が向上する。 MLLM(Multimodal Large Language Models)は,強いマルチモーダル理解と推論を示すが,クロスドメインレコメンデーションへの応用はいまだ検討されていない。この目的のために,短いビデオからライブストリームへのクロスドメインレコメンデーションのための推論誘導フレームワークであるReasoning-Guided Cross-Domain Representation Learning (RGCD-Rep)を提案する。 RGCD-Repは、MLLM推論をリソース効率よく導入し、2段階のトレーニングを通じて行動協調によって導かれる伝達可能なアイテム表現を学習する。まず、推論を意識した蒸留により、凍結した教師MLLMが構造化されたクロスドメイン推論知識を生成し、軽量の学生MLLMに蒸留する。第二に、転送可能性誘導型クロスドメイン表現学習は、アイテム表現を転送可能およびドメイン残留表現に分解する。結果の表現はオフラインで計算され、ダウンストリーム検索タスクに統合され、低コストの産業展開を可能にする。大規模なオフライン実験はRGCD-Repの優位性を示している。 Kuaishouのライブストリーミングレコメンデーションシステムにデプロイした後、A/Bテストは複数のコアビジネスメトリクスに対して大きな効果を示し、実際の産業シナリオにおけるその有効性と実用性を確認した。 RGCD-Repは完全にデプロイされており、毎日4億人のユーザにサービスを提供している。

論文の概要: Bridging Short Videos and Live Streams: Reasoning-Guided Multimodal LLMs for Cross-Domain Representation Learning

関連論文リスト