Fugu-MT 論文翻訳(概要): Align$^3$GR: Unified Multi-Level Alignment for LLM-based Generative Recommendation

論文の概要: Align$^3$GR: Unified Multi-Level Alignment for LLM-based Generative Recommendation

arxiv url: http://arxiv.org/abs/2511.11255v1
Date: Fri, 14 Nov 2025 12:52:43 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-17 22:42:18.605893
Title: Align$^3$GR: Unified Multi-Level Alignment for LLM-based Generative Recommendation
Title（参考訳）: Align$^3$GR:LLMに基づく生成レコメンデーションのための統一マルチレベルアライメント
Authors: Wencai Ye, Mingjie Sun, Shuhang Chen, Wenjin Wu, Peng Jiang,
Abstract要約: Align$3$GRはトークンレベル、振る舞いモデリングレベル、優先度レベルのアライメントを統一する新しいフレームワークである。提案手法は,動的嗜好適応のための自己再生(SP-DPO)と実世界フィードバック(RF-DPO)を組み合わせる。
参考スコア（独自算出の注目度）: 17.5435958671623
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) demonstrate significant advantages in leveraging structured world knowledge and multi-step reasoning capabilities. However, fundamental challenges arise when transforming LLMs into real-world recommender systems due to semantic and behavioral misalignment. To bridge this gap, we propose Align$^3$GR, a novel framework that unifies token-level, behavior modeling-level, and preference-level alignment. Our approach introduces: Dual tokenization fusing user-item semantic and collaborative signals. Enhanced behavior modeling with bidirectional semantic alignment. Progressive DPO strategy combining self-play (SP-DPO) and real-world feedback (RF-DPO) for dynamic preference adaptation. Experiments show Align$^3$GR outperforms the SOTA baseline by +17.8% in Recall@10 and +20.2% in NDCG@10 on the public dataset, with significant gains in online A/B tests and full-scale deployment on an industrial large-scale recommendation platform.
Abstract（参考訳）: 大規模言語モデル(LLM)は、構造化世界知識と多段階推論能力を活用する上で大きな利点を示す。しかし、LLMを現実のレコメンデーションシステムに変換する際には、意味的および行動的ミスアライメントによって根本的な課題が発生する。このギャップを埋めるために、トークンレベル、振る舞いモデリングレベル、優先度レベルのアライメントを統一する新しいフレームワークであるAlign$^3$GRを提案する。ユーザとイテムのセマンティクスと協調的なシグナルを融合したデュアルトークン化。双方向意味的アライメントによる行動モデリングの強化動的嗜好適応のための自己再生(SP-DPO)と実世界フィードバック(RF-DPO)を組み合わせたプログレッシブDPO戦略実験によると、Align$^3$GRは、パブリックデータセットのRecall@10で+17.8%、NDCG@10で+20.2%、オンラインA/Bテストや産業大規模レコメンデーションプラットフォームでのフルスケールデプロイメントにおいて、SOTAベースラインを+17.8%上回っている。

論文の概要: Align$^3$GR: Unified Multi-Level Alignment for LLM-based Generative Recommendation

関連論文リスト