Fugu-MT 論文翻訳(概要): The Mental World of Large Language Models in Recommendation: A Benchmark on Association, Personalization, and Knowledgeability

論文の概要: The Mental World of Large Language Models in Recommendation: A Benchmark on Association, Personalization, and Knowledgeability

arxiv url: http://arxiv.org/abs/2512.17389v1
Date: Fri, 19 Dec 2025 09:44:19 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-22 19:25:54.33075
Title: The Mental World of Large Language Models in Recommendation: A Benchmark on Association, Personalization, and Knowledgeability
Title（参考訳）: 推薦における大規模言語モデルのメンタルワールド:アソシエーション・パーソナライズ・ナレッジビリティのベンチマーク
Authors: Guangneng Hu,
Abstract要約: 大規模言語モデル(LLM)は、ナレッジエンハンサーまたはゼロショットローダとしてそれらを使用することでレコメンデーションシステム(RecSys)の可能性を示している。重要な課題は、LLMとRecSysの間に大きな意味的ギャップがあり、前者は言語の世界知識を、後者はパーソナライズされた行動の世界を捉えている。広範に使用されている推奨データセットから,38K以上の高品質なサンプルと23Mトークンを慎重にコンパイルし,生成するLRWorldというベンチマークを提案する。
参考スコア（独自算出の注目度）: 3.3707422585608953
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) have shown potential in recommendation systems (RecSys) by using them as either knowledge enhancer or zero-shot ranker. A key challenge lies in the large semantic gap between LLMs and RecSys where the former internalizes language world knowledge while the latter captures personalized world of behaviors. Unfortunately, the research community lacks a comprehensive benchmark that evaluates the LLMs over their limitations and boundaries in RecSys so that we can draw a confident conclusion. To investigate this, we propose a benchmark named LRWorld containing over 38K high-quality samples and 23M tokens carefully compiled and generated from widely used public recommendation datasets. LRWorld categorizes the mental world of LLMs in RecSys as three main scales (association, personalization, and knowledgeability) spanned by ten factors with 31 measures (tasks). Based on LRWorld, comprehensive experiments on dozens of LLMs show that they are still not well capturing the deep neural personalized embeddings but can achieve good results on shallow memory-based item-item similarity. They are also good at perceiving item entity relations, entity hierarchical taxonomies, and item-item association rules when inferring user interests. Furthermore, LLMs show a promising ability in multimodal knowledge reasoning (movie poster and product image) and robustness to noisy profiles. None of them show consistently good performance over the ten factors. Model sizes, position bias, and more are ablated.
Abstract（参考訳）: 大規模言語モデル(LLM)は、ナレッジエンハンサーまたはゼロショットローダとしてそれらを使用することでレコメンデーションシステム(RecSys)の可能性を示している。重要な課題は、LLMとRecSysの間の大きな意味的ギャップであり、前者は言語世界の知識を内包し、後者はパーソナライズされた行動の世界を捉えている。残念なことに、リサーチコミュニティには、RecSysの制限とバウンダリに関してLCMを評価する包括的なベンチマークが欠けているため、確実な結論が得られます。そこで本研究では,広範に使用されている公開レコメンデーションデータセットから,38K以上の高品質なサンプルと23Mトークンを慎重にコンパイルして生成するLRWorldというベンチマークを提案する。 LRWorldは、RecSysにおけるLLMのメンタルワールドを、31の尺度(タスク)を持つ10の要因にまたがる3つの主要な尺度(連想、パーソナライゼーション、ナレッジビリティ)に分類している。 LRWorldをベースとして、数十のLLMに関する包括的な実験は、ディープ・ニューラル・パーソナライズされた埋め込みを十分に捉えていないが、浅いメモリベースのアイテムとイテムの類似性について良い結果が得られることを示している。また、アイテム・エンティティ・リレーションシップ、エンティティ・ヒエラルキーの分類、ユーザーの興味を推測する際のアイテム・イテム・アソシエーション・ルールの認識にも長けている。さらに、LLMはマルチモーダルな知識推論(移動ポスターと製品画像)とノイズのあるプロファイルに対する堅牢性において有望な能力を示す。いずれも、10つの要因に対して一貫して優れたパフォーマンスを示すものではありません。モデルのサイズや位置バイアスなどが短縮されます。

論文の概要: The Mental World of Large Language Models in Recommendation: A Benchmark on Association, Personalization, and Knowledgeability

関連論文リスト