Dynamic Evaluation Framework for Personalized and Trustworthy Agents: A Multi-Session Approach to Preference Adaptability
- URL: http://arxiv.org/abs/2504.06277v1
- Date: Sat, 08 Mar 2025 22:50:26 GMT
- Title: Dynamic Evaluation Framework for Personalized and Trustworthy Agents: A Multi-Session Approach to Preference Adaptability
- Authors: Chirag Shah, Hideo Joho, Kirandeep Kaur, Preetam Prabhu Srikar Dammu,
- Abstract summary: We argue for a paradigm shift in evaluating personalized and adaptive agents.<n>We propose a comprehensive novel framework that models user personas with unique attributes and preferences.<n>Our flexible framework is designed to support a variety of agents and applications, ensuring a comprehensive and versatile evaluation of recommendation strategies.
- Score: 10.443994990138973
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recent advancements in generative AI have significantly increased interest in personalized agents. With increased personalization, there is also a greater need for being able to trust decision-making and action taking capabilities of these agents. However, the evaluation methods for these agents remain outdated and inadequate, often failing to capture the dynamic and evolving nature of user interactions. In this conceptual article, we argue for a paradigm shift in evaluating personalized and adaptive agents. We propose a comprehensive novel framework that models user personas with unique attributes and preferences. In this framework, agents interact with these simulated users through structured interviews to gather their preferences and offer customized recommendations. These recommendations are then assessed dynamically using simulations driven by Large Language Models (LLMs), enabling an adaptive and iterative evaluation process. Our flexible framework is designed to support a variety of agents and applications, ensuring a comprehensive and versatile evaluation of recommendation strategies that focus on proactive, personalized, and trustworthy aspects.
Related papers
- Exploring the Impact of Personality Traits on Conversational Recommender Systems: A Simulation with Large Language Models [70.180385882195]
This paper introduces a personality-aware user simulation for Conversational Recommender Systems (CRSs)
The user agent induces customizable personality traits and preferences, while the system agent possesses the persuasion capability to simulate realistic interaction in CRSs.
Experimental results demonstrate that state-of-the-art LLMs can effectively generate diverse user responses aligned with specified personality traits.
arXiv Detail & Related papers (2025-04-09T13:21:17Z) - Reasoning LLMs for User-Aware Multimodal Conversational Agents [3.533721662684487]
Personalization in social robotics is critical for fostering effective human-robot interactions.<n>This paper proposes a novel framework called USER-LLM R1 for a user-aware conversational agent.<n>Our approach integrates chain-of-thought (CoT) reasoning models to iteratively infer user preferences and vision-language models.
arXiv Detail & Related papers (2025-04-02T13:00:17Z) - FLOW: A Feedback LOop FrameWork for Simultaneously Enhancing Recommendation and User Agents [28.25107058257086]
We propose a novel framework named FLOW, which achieves collaboration between the recommendation agent and the user agent by introducing a feedback loop.
Specifically, the recommendation agent refines its understanding of the user's preferences by analyzing the user agent's feedback on previously suggested items.
This iterative refinement process enhances the reasoning capabilities of both the recommendation agent and the user agent, enabling more precise recommendations.
arXiv Detail & Related papers (2024-10-26T00:51:39Z) - Towards Unified Multi-Modal Personalization: Large Vision-Language Models for Generative Recommendation and Beyond [87.1712108247199]
Our goal is to establish a Unified paradigm for Multi-modal Personalization systems (UniMP)
We develop a generic and personalization generative framework, that can handle a wide range of personalized needs.
Our methodology enhances the capabilities of foundational language models for personalized tasks.
arXiv Detail & Related papers (2024-03-15T20:21:31Z) - Tell Me More! Towards Implicit User Intention Understanding of Language
Model Driven Agents [110.25679611755962]
Current language model-driven agents often lack mechanisms for effective user participation, which is crucial given the vagueness commonly found in user instructions.
We introduce Intention-in-Interaction (IN3), a novel benchmark designed to inspect users' implicit intentions through explicit queries.
We empirically train Mistral-Interact, a powerful model that proactively assesses task vagueness, inquires user intentions, and refines them into actionable goals.
arXiv Detail & Related papers (2024-02-14T14:36:30Z) - On Generative Agents in Recommendation [58.42840923200071]
Agent4Rec is a user simulator in recommendation based on Large Language Models.
Each agent interacts with personalized recommender models in a page-by-page manner.
arXiv Detail & Related papers (2023-10-16T06:41:16Z) - AgentCF: Collaborative Learning with Autonomous Language Agents for
Recommender Systems [112.76941157194544]
We propose AgentCF for simulating user-item interactions in recommender systems through agent-based collaborative filtering.
We creatively consider not only users but also items as agents, and develop a collaborative learning approach that optimize both kinds of agents together.
Overall, the optimized agents exhibit diverse interaction behaviors within our framework, including user-item, user-user, item-item, and collective interactions.
arXiv Detail & Related papers (2023-10-13T16:37:14Z) - Modeling Recommender Ecosystems: Research Challenges at the Intersection
of Mechanism Design, Reinforcement Learning and Generative Models [17.546954143602818]
We argue that modeling the incentives and behaviors of all actors in the system is strictly necessary to maximize the value the system brings to these actors and improve overall ecosystem "health"
We propose a conceptual framework that encompasses these elements, and articulate a number of research challenges that emerge at the intersection of these different disciplines.
arXiv Detail & Related papers (2023-09-08T03:20:58Z) - Latent User Intent Modeling for Sequential Recommenders [92.66888409973495]
Sequential recommender models learn to predict the next items a user is likely to interact with based on his/her interaction history on the platform.
Most sequential recommenders however lack a higher-level understanding of user intents, which often drive user behaviors online.
Intent modeling is thus critical for understanding users and optimizing long-term user experience.
arXiv Detail & Related papers (2022-11-17T19:00:24Z) - Differential Assessment of Black-Box AI Agents [29.98710357871698]
We propose a novel approach to differentially assess black-box AI agents that have drifted from their previously known models.
We leverage sparse observations of the drifted agent's current behavior and knowledge of its initial model to generate an active querying policy.
Empirical evaluation shows that our approach is much more efficient than re-learning the agent model from scratch.
arXiv Detail & Related papers (2022-03-24T17:48:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.