Fugu-MT 論文翻訳(概要): Empowering Large Language Model for Sequential Recommendation via Multimodal Embeddings and Semantic IDs

論文の概要: Empowering Large Language Model for Sequential Recommendation via Multimodal Embeddings and Semantic IDs

arxiv url: http://arxiv.org/abs/2509.02017v1
Date: Tue, 02 Sep 2025 07:02:29 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-04 15:17:03.931812
Title: Empowering Large Language Model for Sequential Recommendation via Multimodal Embeddings and Semantic IDs
Title（参考訳）: マルチモーダル埋め込みとセマンティックIDによるシーケンスレコメンデーションのための大規模言語モデルの構築
Authors: Yuhao Wang, Junwei Pan, Xinhang Li, Maolin Wang, Yuan Wang, Yue Liu, Dapeng Liu, Jie Jiang, Xiangyu Zhao,
Abstract要約: シークエンシャルレコメンデーション(SR)は,ユーザの動的関心や時系列パターンを過去のインタラクションに基づいて捉えることを目的としている。 MME-SIDは多モード埋め込みと量子埋め込みを統合し、埋め込み崩壊を緩和する。 3つの公開データセットに対する大規模な実験により、MME-SIDの優れた性能が検証された。
参考スコア（独自算出の注目度）: 28.752042722391934
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Sequential recommendation (SR) aims to capture users' dynamic interests and sequential patterns based on their historical interactions. Recently, the powerful capabilities of large language models (LLMs) have driven their adoption in SR. However, we identify two critical challenges in existing LLM-based SR methods: 1) embedding collapse when incorporating pre-trained collaborative embeddings and 2) catastrophic forgetting of quantized embeddings when utilizing semantic IDs. These issues dampen the model scalability and lead to suboptimal recommendation performance. Therefore, based on LLMs like Llama3-8B-instruct, we introduce a novel SR framework named MME-SID, which integrates multimodal embeddings and quantized embeddings to mitigate embedding collapse. Additionally, we propose a Multimodal Residual Quantized Variational Autoencoder (MM-RQ-VAE) with maximum mean discrepancy as the reconstruction loss and contrastive learning for alignment, which effectively preserve intra-modal distance information and capture inter-modal correlations, respectively. To further alleviate catastrophic forgetting, we initialize the model with the trained multimodal code embeddings. Finally, we fine-tune the LLM efficiently using LoRA in a multimodal frequency-aware fusion manner. Extensive experiments on three public datasets validate the superior performance of MME-SID thanks to its capability to mitigate embedding collapse and catastrophic forgetting. The implementation code and datasets are publicly available for reproduction: https://github.com/Applied-Machine-Learning-Lab/MME-SID.
Abstract（参考訳）: シークエンシャルレコメンデーション(SR)は,ユーザの動的関心や時系列パターンを過去のインタラクションに基づいて捉えることを目的としている。近年、大きな言語モデル(LLM)の強力な能力は、SRでの採用を促している。しかし、既存のLLMベースのSR手法における2つの重要な課題を特定した。 1) 予め訓練した組込みと組込みによる埋込み崩壊 2) セマンティックIDを利用する場合の量子化埋め込みの破滅的忘れこれらの問題はモデルのスケーラビリティを低下させ、最適以下のレコメンデーションパフォーマンスをもたらします。したがって、Llama3-8B-instructのようなLLMに基づいて、マルチモーダル埋め込みと量子埋め込みを統合して埋め込み崩壊を緩和するMME-SIDと呼ばれる新しいSRフレームワークを導入する。さらに,アライメントのためのアライメントにおける再構成損失とコントラスト学習として,最大平均誤差を持つMM-RQ-VAE(Multimodal Residual Quantized Variational Autoencoder)を提案する。破滅的な忘れをさらに軽減するために、訓練されたマルチモーダルコード埋め込みを用いてモデルを初期化する。最後に、LoRAをマルチモーダル周波数対応核融合方式で効率的にLLMを微調整する。 3つの公開データセットに対する大規模な実験は、埋め込み崩壊と破滅的な忘れを緩和する能力により、MME-SIDの優れた性能を評価する。実装コードとデータセットは、 https://github.com/Applied-Machine-Learning-Lab/MME-SID で公開されている。

論文の概要: Empowering Large Language Model for Sequential Recommendation via Multimodal Embeddings and Semantic IDs

関連論文リスト