Related papers: ALPBench: A Benchmark for Attribution-level Long-term Personal Behavior Understanding

ALPBench: A Benchmark for Attribution-level Long-term Personal Behavior Understanding

URL: http://arxiv.org/abs/2602.03056v1
Date: Tue, 03 Feb 2026 03:32:16 GMT
Title: ALPBench: A Benchmark for Attribution-level Long-term Personal Behavior Understanding
Authors: Lu Ren, Junda She, Xinchen Luo, Tao Wang, Xin Ye, Xu Zhang, Muxuan Wang, Xiao Yang, Chenguang Wang, Fei Xie, Yiwei Zhou, Danjun Wu, Guodong Zhang, Yifei Hu, Guoying Zheng, Shujie Yang, Xingmei Wang, Shiyao Wang, Yukun Zhou, Fan Yang, Size Li, Kuo Cai, Qiang Luo, Ruiming Tang, Han Li, Kun Gai,
Abstract summary: ALPBench is a Benchmark for Attribution-level Long-term Personal Behavior Understanding.<n>It predicts user-interested attribute combinations, enabling ground-truth evaluation.<n>It models preferences from long-term historical behaviors rather than users' explicitly expressed requests.
Score: 53.88804678012327
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in large language models have highlighted their potential for personalized recommendation, where accurately capturing user preferences remains a key challenge. Leveraging their strong reasoning and generalization capabilities, LLMs offer new opportunities for modeling long-term user behavior. To systematically evaluate this, we introduce ALPBench, a Benchmark for Attribution-level Long-term Personal Behavior Understanding. Unlike item-focused benchmarks, ALPBench predicts user-interested attribute combinations, enabling ground-truth evaluation even for newly introduced items. It models preferences from long-term historical behaviors rather than users' explicitly expressed requests, better reflecting enduring interests. User histories are represented as natural language sequences, allowing interpretable, reasoning-based personalization. ALPBench enables fine-grained evaluation of personalization by focusing on the prediction of attribute combinations task that remains highly challenging for current LLMs due to the need to capture complex interactions among multiple attributes and reason over long-term user behavior sequences.

Related papers

Towards Realistic Personalization: Evaluating Long-Horizon Preference Following in Personalized User-LLM Interactions [50.70965714314064]
Large Language Models (LLMs) are increasingly serving as personal assistants, where users share complex and diverse preferences over extended interactions.<n>This work proposes RealPref, a benchmark for evaluating realistic preference-following in personalized user-LLM interactions.
arXiv Detail & Related papers (2026-03-04T15:42:43Z)
Mem-PAL: Towards Memory-based Personalized Dialogue Assistants for Long-term User-Agent Interaction [55.24448139349266]
We present PAL-Bench, a new benchmark designed to evaluate the personalization capabilities of service-oriented assistants in long-term user-agent interactions.<n>To improve personalized service-oriented interactions, we propose H$2$Memory, a hierarchical and heterogeneous memory framework.
arXiv Detail & Related papers (2025-11-17T14:22:32Z)
Using LLMs to Capture Users' Temporal Context for Recommendation [3.719862246745416]
This paper presents an assessment of Large Language Models (LLMs) for generating semantically rich, time-aware user profiles.<n>We do not propose a novel end-to-end recommendation architecture, but the core contribution is a systematic investigation into the degree of LLM effectiveness.<n>The evaluation across Movies&TV and Video Games domains suggests that while LLM-generated profiles offer semantic depth and temporal structure, their effectiveness for context-aware recommendations is notably contingent on the richness of user interaction histories.
arXiv Detail & Related papers (2025-08-11T22:48:31Z)
Temporal User Profiling with LLMs: Balancing Short-Term and Long-Term Preferences for Recommendations [3.719862246745416]
We propose a novel method for user profiling that explicitly models short-term and long-term preferences.<n>LLM-TUP achieves substantial improvements over several baselines.
arXiv Detail & Related papers (2025-08-11T20:28:24Z)
What Makes LLMs Effective Sequential Recommenders? A Study on Preference Intensity and Temporal Context [56.590259941275434]
RecPO is a preference optimization framework for sequential recommendation.<n>It exploits adaptive reward margins based on inferred preference hierarchies and temporal signals.<n>It mirrors key characteristics of human decision-making: favoring timely satisfaction, maintaining coherent preferences, and exercising discernment under shifting contexts.
arXiv Detail & Related papers (2025-06-02T21:09:29Z)
A Personalized Conversational Benchmark: Towards Simulating Personalized Conversations [112.81207927088117]
PersonaConvBench is a benchmark for evaluating personalized reasoning and generation in multi-turn conversations with large language models (LLMs)<n>We benchmark several commercial and open-source LLMs under a unified prompting setup and observe that incorporating personalized history yields substantial performance improvements.
arXiv Detail & Related papers (2025-05-20T09:13:22Z)
Towards Explainable Temporal User Profiling with LLMs [3.719862246745416]
We leverage large language models (LLMs) to generate natural language summaries of users' interaction histories.<n>Our framework not only models temporal user preferences but also produces natural language profiles that can be used to explain recommendations in an interpretable manner.
arXiv Detail & Related papers (2025-05-01T22:02:46Z)
PersonalLLM: Tailoring LLMs to Individual Preferences [11.717169516971856]
We present a public benchmark, PersonalLLM, focusing on adapting LLMs to provide maximal benefits for a particular user.<n>We curate open-ended prompts paired with many high-quality answers over which users would be expected to display heterogeneous latent preferences.<n>Our dataset and generated personalities offer an innovative testbed for developing personalization algorithms.
arXiv Detail & Related papers (2024-09-30T13:55:42Z)
Dynamic Memory based Attention Network for Sequential Recommendation [79.5901228623551]
We propose a novel long sequential recommendation model called Dynamic Memory-based Attention Network (DMAN) It segments the overall long behavior sequence into a series of sub-sequences, then trains the model and maintains a set of memory blocks to preserve long-term interests of users. Based on the dynamic memory, the user's short-term and long-term interests can be explicitly extracted and combined for efficient joint recommendation.
arXiv Detail & Related papers (2021-02-18T11:08:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.