Related papers: Embodied Agents Meet Personalization: Exploring Memory Utilization for Personalized Assistance

Embodied Agents Meet Personalization: Exploring Memory Utilization for Personalized Assistance

URL: http://arxiv.org/abs/2505.16348v1
Date: Thu, 22 May 2025 08:00:10 GMT
Title: Embodied Agents Meet Personalization: Exploring Memory Utilization for Personalized Assistance
Authors: Taeyoon Kwon, Dongwook Choi, Sunghwan Kim, Hyojun Kim, Seungjun Moon, Beong-woo Kwak, Kuan-Hao Huang, Jinyoung Yeo,
Abstract summary: Embodied agents empowered by large language models (LLMs) have shown strong performance in household object rearrangement tasks.<n>Yet, the effectiveness of embodied agents in utilizing memory for personalized assistance remains largely underexplored.<n>We present MEMENTO, a personalized embodied agent evaluation framework designed to assess memory utilization capabilities.
Score: 18.820008753896623
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Embodied agents empowered by large language models (LLMs) have shown strong performance in household object rearrangement tasks. However, these tasks primarily focus on single-turn interactions with simplified instructions, which do not truly reflect the challenges of providing meaningful assistance to users. To provide personalized assistance, embodied agents must understand the unique semantics that users assign to the physical world (e.g., favorite cup, breakfast routine) by leveraging prior interaction history to interpret dynamic, real-world instructions. Yet, the effectiveness of embodied agents in utilizing memory for personalized assistance remains largely underexplored. To address this gap, we present MEMENTO, a personalized embodied agent evaluation framework designed to comprehensively assess memory utilization capabilities to provide personalized assistance. Our framework consists of a two-stage memory evaluation process design that enables quantifying the impact of memory utilization on task performance. This process enables the evaluation of agents' understanding of personalized knowledge in object rearrangement tasks by focusing on its role in goal interpretation: (1) the ability to identify target objects based on personal meaning (object semantics), and (2) the ability to infer object-location configurations from consistent user patterns, such as routines (user patterns). Our experiments across various LLMs reveal significant limitations in memory utilization, with even frontier models like GPT-4o experiencing a 30.5% performance drop when required to reference multiple memories, particularly in tasks involving user patterns. These findings, along with our detailed analyses and case studies, provide valuable insights for future research in developing more effective personalized embodied agents. Project website: https://connoriginal.github.io/MEMENTO

Related papers

PRIME: Large Language Model Personalization with Cognitive Memory and Thought Processes [6.631626634132574]
Large language model (LLM) personalization aims to align model outputs with individuals' unique preferences and opinions.<n>We introduce a unified framework, PRIME, using episodic and semantic memory mechanisms.<n>Experiments validate PRIME's effectiveness across both long- and short-context scenarios.
arXiv Detail & Related papers (2025-07-07T01:54:34Z)
FindingDory: A Benchmark to Evaluate Memory in Embodied Agents [49.89792845476579]
We introduce a new benchmark for long-range embodied tasks in the Habitat simulator.<n>This benchmark evaluates memory-based capabilities across 60 tasks requiring sustained engagement and contextual awareness.
arXiv Detail & Related papers (2025-06-18T17:06:28Z)
PersonaAgent: When Large Language Model Agents Meet Personalization at Test Time [87.99027488664282]
PersonaAgent is a framework designed to address versatile personalization tasks.<n>It integrates a personalized memory module and a personalized action module.<n>Test-time user-preference alignment strategy ensures real-time user preference alignment.
arXiv Detail & Related papers (2025-06-06T17:29:49Z)
How Memory Management Impacts LLM Agents: An Empirical Study of Experience-Following Behavior [49.62361184944454]
Memory is a critical component in large language model (LLM)-based agents.<n>We study how memory management choices impact the LLM agents' behavior, especially their long-term performance.
arXiv Detail & Related papers (2025-05-21T22:35:01Z)
Are We Solving a Well-Defined Problem? A Task-Centric Perspective on Recommendation Tasks [46.705107776194616]
We analyze RecSys task formulations, emphasizing key components such as input-output structures, temporal dynamics, and candidate item selection.<n>We explore the balance between task specificity and model generalizability, highlighting how well-defined task formulations serve as the foundation for robust evaluation and effective solution development.
arXiv Detail & Related papers (2025-03-27T06:10:22Z)
Identifying User Goals from UI Trajectories [19.492331502146886]
We propose a new task goal identification from observed UI trajectories.<n>We also introduce a novel evaluation methodology designed to assess whether two intent descriptions can be considered paraphrases.<n>To benchmark this task, we compare the performance of humans and state-of-the-art models, specifically GPT-4 and Gemini-1.5 Pro.
arXiv Detail & Related papers (2024-06-20T13:46:10Z)
Persona-DB: Efficient Large Language Model Personalization for Response Prediction with Collaborative Data Refinement [79.2400720115588]
We introduce Persona-DB, a simple yet effective framework consisting of a hierarchical construction process to improve generalization across task contexts.<n>In the evaluation of response prediction, Persona-DB demonstrates superior context efficiency in maintaining accuracy with a significantly reduced retrieval size.<n>Our experiments also indicate a marked improvement of over 10% under cold-start scenarios, when users have extremely sparse data.
arXiv Detail & Related papers (2024-02-16T20:20:43Z)
Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven Agents [110.25679611755962]
Current language model-driven agents often lack mechanisms for effective user participation, which is crucial given the vagueness commonly found in user instructions. We introduce Intention-in-Interaction (IN3), a novel benchmark designed to inspect users' implicit intentions through explicit queries. We empirically train Mistral-Interact, a powerful model that proactively assesses task vagueness, inquires user intentions, and refines them into actionable goals.
arXiv Detail & Related papers (2024-02-14T14:36:30Z)
Personalized Large Language Model Assistant with Evolving Conditional Memory [15.780762727225122]
We present a plug-and-play framework that could facilitate personalized large language model assistants with evolving conditional memory. The personalized assistant focuses on intelligently preserving the knowledge and experience from the history dialogue with the user.
arXiv Detail & Related papers (2023-12-22T02:39:15Z)
Localizing Active Objects from Egocentric Vision with Symbolic World Knowledge [62.981429762309226]
The ability to actively ground task instructions from an egocentric view is crucial for AI agents to accomplish tasks or assist humans virtually. We propose to improve phrase grounding models' ability on localizing the active objects by: learning the role of objects undergoing change and extracting them accurately from the instructions. We evaluate our framework on Ego4D and Epic-Kitchens datasets.
arXiv Detail & Related papers (2023-10-23T16:14:05Z)
Diagnosis, Feedback, Adaptation: A Human-in-the-Loop Framework for Test-Time Policy Adaptation [20.266695694005943]
Policies often fail due to distribution shift -- changes in the state and reward that occur when a policy is deployed in new environments. Data augmentation can increase robustness by making the model invariant to task-irrelevant changes in the agent's observation. We propose an interactive framework to leverage feedback directly from the user to identify personalized task-irrelevant concepts.
arXiv Detail & Related papers (2023-07-12T17:55:08Z)
Information-Theoretic Odometry Learning [83.36195426897768]
We propose a unified information theoretic framework for learning-motivated methods aimed at odometry estimation. The proposed framework provides an elegant tool for performance evaluation and understanding in information-theoretic language.
arXiv Detail & Related papers (2022-03-11T02:37:35Z)
PeTra: A Sparsely Supervised Memory Model for People Tracking [50.98911178059019]
We propose PeTra, a memory-augmented neural network designed to track entities in its memory slots. We empirically compare key modeling choices, finding that we can simplify several aspects of the design of the memory module while retaining strong performance. PeTra is highly effective in both evaluations, demonstrating its ability to track people in its memory despite being trained with limited annotation.
arXiv Detail & Related papers (2020-05-06T17:45:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.