Fugu-MT 論文翻訳(概要): Exploring Test-time Scaling via Prediction Merging on Large-Scale Recommendation

論文の概要: Exploring Test-time Scaling via Prediction Merging on Large-Scale Recommendation

arxiv url: http://arxiv.org/abs/2512.07650v1
Date: Mon, 08 Dec 2025 15:41:10 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-09 22:03:54.947471
Title: Exploring Test-time Scaling via Prediction Merging on Large-Scale Recommendation
Title（参考訳）: 大規模レコメンデーションにおける予測マージによるテスト時間スケーリングの探索
Authors: Fuyuan Lyu, Zhentai Chen, Jingyan Jiang, Lingjie Li, Xing Tang, Xiuqiang He, Xue Liu,
Abstract要約: テスト期間中に計算資源を効率的に活用し、スケールアップする方法は、まだ未定である。 DLRSにテスト時間スケーリングを適用する上で重要なポイントは、多様だが有意義なアウトプットを効果的に生成することにある。オンラインデプロイ時の並列サーバの増加により、テスト時間のスケーリングはシームレスに加速できる。
参考スコア（独自算出の注目度）: 13.057539100440634
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Inspired by the success of language models (LM), scaling up deep learning recommendation systems (DLRS) has become a recent trend in the community. All previous methods tend to scale up the model parameters during training time. However, how to efficiently utilize and scale up computational resources during test time remains underexplored, which can prove to be a scaling-efficient approach and bring orthogonal improvements in LM domains. The key point in applying test-time scaling to DLRS lies in effectively generating diverse yet meaningful outputs for the same instance. We propose two ways: One is to explore the heterogeneity of different model architectures. The other is to utilize the randomness of model initialization under a homogeneous architecture. The evaluation is conducted across eight models, including both classic and SOTA models, on three benchmarks. Sufficient evidence proves the effectiveness of both solutions. We further prove that under the same inference budget, test-time scaling can outperform parameter scaling. Our test-time scaling can also be seamlessly accelerated with the increase in parallel servers when deployed online, without affecting the inference time on the user side. Code is available.
Abstract（参考訳）: 言語モデル(LM)の成功に触発されて、ディープラーニングレコメンデーションシステム(DLRS)のスケールアップが、近年のコミュニティのトレンドとなっている。以前のメソッドはすべて、トレーニング期間中にモデルのパラメータをスケールアップする傾向があります。しかし、テスト期間中に計算資源を効率的に利用し、スケールアップする方法は未定であり、スケーリング効率の良いアプローチであることが証明され、LMドメインの直交改善がもたらされる。 DLRSにテスト時間スケーリングを適用する上で重要な点は、同じインスタンスに対して多様だが有意義なアウトプットを効果的に生成することにある。ひとつは、異なるモデルアーキテクチャの不均一性を探求することである。もう1つは、同質なアーキテクチャの下でモデル初期化のランダム性を利用することである。評価は古典モデルとSOTAモデルの両方を含む8つのモデルで3つのベンチマークで行われる。十分な証拠は両方の解の有効性を証明している。さらに、同じ推論予算の下では、テストタイムのスケーリングがパラメータのスケーリングより優れていることを証明します。テストタイムのスケーリングは、オンラインデプロイ時の並列サーバの増加によって、ユーザ側の推論時間に影響を与えることなく、シームレスに高速化することが可能です。コードは利用可能。

論文の概要: Exploring Test-time Scaling via Prediction Merging on Large-Scale Recommendation

関連論文リスト