Fugu-MT 論文翻訳(概要): LASAR: Latent Adaptive Semantic Aligned Reasoning for Generative Recommendation

論文の概要: LASAR: Latent Adaptive Semantic Aligned Reasoning for Generative Recommendation

arxiv url: http://arxiv.org/abs/2605.10207v1
Date: Mon, 11 May 2026 08:52:53 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-12 23:28:50.668537
Title: LASAR: Latent Adaptive Semantic Aligned Reasoning for Generative Recommendation
Title（参考訳）: LASAR: ジェネレーティブレコメンデーションのための潜在適応型セマンティックアライメント推論
Authors: Yiwen Chen, Fuwei Zhang, Zehao Chen, Deqing Wang, Hehan Li, Peizhi Xu, Hanmeng Liu, Shuanglong Li, Xin Pei, Fuzhen Zhuang, Zhao Zhang,
Abstract要約: 大規模言語モデル(LLM)において、潜在推論が効果的なパラダイムとして出現した SFT-then-RLフレームワークであるLASAR(Latent Adaptive Semantic Aligned Reasoning)を提案する。 3つの実世界のデータセットの実験は、LASARがすべてのベースラインを上回っていることを示している。
参考スコア（独自算出の注目度）: 33.48046116606003
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) have demonstrated powerful reasoning capabilities through Chain-of-Thought (CoT) in various tasks, yet the inefficiency of token-by-token generation hinders real-world deployment in latency-sensitive recommender systems. Latent reasoning has emerged as an effective paradigm in LLMs, performing multi-step inference in a continuous hidden-state space to achieve stronger reasoning at lower cost. However, this paradigm remains underexplored in mainstream generative recommendation. Adapting it reveals three unique challenges: (1) the gap between prior-less Semantic ID (SID) symbols and continuous latent reasoning - SIDs lack pre-trained semantics, hindering joint optimization; (2) representation drift due to a lack of reasoning chain supervision; and (3) the suboptimality of applying a globally fixed reasoning depth. To address these, we propose LASAR (Latent Adaptive Semantic Aligned Reasoning), an SFT-then-RL framework. First, we bridge this gap via two-stage training: Stage 1 grounds SID semantics before Stage 2 introduces latent reasoning, ensuring efficient convergence. Second, we mitigate representation drift through explicit CoT semantic alignment. Step-wise bidirectional KL divergence constrains the latent reasoning trajectory using hidden-state anchors extracted from CoT text, while a Policy Head predicts per-sample reasoning depth. Third, during the GRPO-based RL phase, terminal-only KL alignment accommodates variable-length reasoning, and REINFORCE optimizes the Policy Head to dynamically allocate steps. This nearly halves the average latent step count while simultaneously improving recommendation quality. Experiments on three real-world datasets demonstrate that LASAR outperforms all baselines. It adds marginal inference latency and is roughly 20 times faster than generating explicit CoT text.
Abstract（参考訳）: 大規模言語モデル(LLM)は、さまざまなタスクにおいてChain-of-Thought(CoT)を通じて強力な推論能力を示しているが、トークン・バイ・トークン生成の非効率性は、遅延に敏感なレコメンデータシステムにおける現実的なデプロイメントを妨げる。遅延推論はLLMにおいて有効なパラダイムとして現れ、連続的な隠れ状態空間において複数ステップの推論を行い、より低コストでより強力な推論を実現する。しかし、このパラダイムは依然として主流のジェネレーティブ・レコメンデーションにおいて過小評価されている。適応化には,(1)先行するセマンティックID(SID)シンボルと連続的な潜伏推論のギャップが,事前訓練された意味論の欠如や共同最適化の障害,(2)推論チェーンの監督の欠如による表現のドリフト,(3)グローバルに固定された推論深度を適用する際の準最適性,という3つの課題がある。そこで我々は,SFT-then-RLフレームワークであるLASAR(Latent Adaptive Semantic Aligned Reasoning)を提案する。まず、このギャップを2段階のトレーニングで埋める: ステージ1は、ステージ2の前にSIDセマンティクスを基礎にして、遅延推論を導入し、効率的な収束を確保する。第2に、明示的なCoTセマンティックアライメントを通じて表現のドリフトを緩和する。ステップワイド双方向KL分散は、CoTテキストから抽出された隠れ状態アンカーを用いて遅延推論軌道を制約し、ポリシヘッドはサンプルごとの推論深さを予測する。第3に、GRPOベースのRLフェーズでは、端末のみのKLアライメントが可変長推論に対応し、REINFORCEはポリシーヘッドを最適化してステップを動的に割り当てる。これは平均的な遅延ステップ数をほぼ半分にし、同時にレコメンデーション品質を改善します。 3つの実世界のデータセットの実験は、LASARがすべてのベースラインを上回っていることを示している。差分推論のレイテンシが増加し、明示的なCoTテキストを生成するよりも約20倍高速になる。

論文の概要: LASAR: Latent Adaptive Semantic Aligned Reasoning for Generative Recommendation

関連論文リスト