DEEPER Insight into Your User: Directed Persona Refinement for Dynamic Persona Modeling
- URL: http://arxiv.org/abs/2502.11078v1
- Date: Sun, 16 Feb 2025 11:02:37 GMT
- Title: DEEPER Insight into Your User: Directed Persona Refinement for Dynamic Persona Modeling
- Authors: Aili Chen, Chengyu Du, Jiangjie Chen, Jinghan Xu, Yikai Zhang, Siyu Yuan, Zulong Chen, Liangyue Li, Yanghua Xiao,
- Abstract summary: We propose DEEPER, a novel approach for dynamic persona modeling that enables continual persona optimization.
Experiments on dynamic persona modeling involving 4800 users across 10 domains highlight the superior persona optimization capabilities of DEEPER.
- Score: 38.18345641589625
- License:
- Abstract: To advance personalized applications such as recommendation systems and user behavior prediction, recent research increasingly adopts large language models (LLMs) for human -readable persona modeling. In dynamic real -world scenarios, effective persona modeling necessitates leveraging streaming behavior data to continually optimize user personas. However, existing methods -whether regenerating personas or incrementally extending them with new behaviors -often fail to achieve sustained improvements in persona quality or future behavior prediction accuracy. To address this, we propose DEEPER, a novel approach for dynamic persona modeling that enables continual persona optimization. Specifically, we enhance the model's direction -search capability through an iterative reinforcement learning framework, allowing it to automatically identify effective update directions and optimize personas using discrepancies between user behaviors and model predictions. Extensive experiments on dynamic persona modeling involving 4800 users across 10 domains highlight the superior persona optimization capabilities of DEEPER, delivering an impressive 32.2% average reduction in user behavior prediction error over four update rounds -outperforming the best baseline by a remarkable 22.92%.
Related papers
- MotionRL: Align Text-to-Motion Generation to Human Preferences with Multi-Reward Reinforcement Learning [99.09906827676748]
We introduce MotionRL, the first approach to utilize Multi-Reward Reinforcement Learning (RL) for optimizing text-to-motion generation tasks.
Our novel approach uses reinforcement learning to fine-tune the motion generator based on human preferences prior knowledge of the human perception model.
In addition, MotionRL introduces a novel multi-objective optimization strategy to approximate optimality between text adherence, motion quality, and human preferences.
arXiv Detail & Related papers (2024-10-09T03:27:14Z) - USE: Dynamic User Modeling with Stateful Sequence Models [26.74966828348815]
User Stateful Embedding (USE) generates user embeddings without the need for exhaustive reprocessing.
We introduce a novel training objective named future W-behavior prediction to transcend the limitations of next-token prediction.
We conduct experiments on 8 downstream tasks using Snapchat users' behavioral logs in both static (i.e., fixed user behavior sequences) and dynamic (i.e. periodically updated user behavior sequences) settings.
arXiv Detail & Related papers (2024-03-20T07:05:19Z) - PUNR: Pre-training with User Behavior Modeling for News Recommendation [26.349183393252115]
News recommendation aims to predict click behaviors based on user behaviors.
How to effectively model the user representations is the key to recommending preferred news.
We propose an unsupervised pre-training paradigm with two tasks, i.e. user behavior masking and user behavior generation.
arXiv Detail & Related papers (2023-04-25T08:03:52Z) - Latent User Intent Modeling for Sequential Recommenders [92.66888409973495]
Sequential recommender models learn to predict the next items a user is likely to interact with based on his/her interaction history on the platform.
Most sequential recommenders however lack a higher-level understanding of user intents, which often drive user behaviors online.
Intent modeling is thus critical for understanding users and optimizing long-term user experience.
arXiv Detail & Related papers (2022-11-17T19:00:24Z) - Optimal Behavior Prior: Data-Efficient Human Models for Improved
Human-AI Collaboration [0.5524804393257919]
We show that using optimal behavior as a prior for human models makes these models vastly more data-efficient.
We also show that using these improved human models often leads to better human-AI collaboration performance.
arXiv Detail & Related papers (2022-11-03T06:10:22Z) - PinnerFormer: Sequence Modeling for User Representation at Pinterest [60.335384724891746]
We introduce PinnerFormer, a user representation trained to predict a user's future long-term engagement.
Unlike prior approaches, we adapt our modeling to a batch infrastructure via our new dense all-action loss.
We show that by doing so, we significantly close the gap between batch user embeddings that are generated once a day and realtime user embeddings generated whenever a user takes an action.
arXiv Detail & Related papers (2022-05-09T18:26:51Z) - Preference Enhanced Social Influence Modeling for Network-Aware Cascade
Prediction [59.221668173521884]
We propose a novel framework to promote cascade size prediction by enhancing the user preference modeling.
Our end-to-end method makes the user activating process of information diffusion more adaptive and accurate.
arXiv Detail & Related papers (2022-04-18T09:25:06Z) - Generative Adversarial Reward Learning for Generalized Behavior Tendency
Inference [71.11416263370823]
We propose a generative inverse reinforcement learning for user behavioral preference modelling.
Our model can automatically learn the rewards from user's actions based on discriminative actor-critic network and Wasserstein GAN.
arXiv Detail & Related papers (2021-05-03T13:14:25Z) - Reinforcement Learning Beyond Expectation [11.428014000851535]
Cumulative prospect theory (CPT) is a paradigm that has been empirically shown to model a tendency of humans to view gains and losses differently.
In this paper, we consider a setting where an autonomous agent has to learn behaviors in an unknown environment.
In order to endow the agent with the ability to closely mimic the behavior of human users, we optimize a CPT-based cost.
arXiv Detail & Related papers (2021-03-29T20:35:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.