Episodic Memories Generation and Evaluation Benchmark for Large Language Models
- URL: http://arxiv.org/abs/2501.13121v1
- Date: Tue, 21 Jan 2025 02:16:13 GMT
- Title: Episodic Memories Generation and Evaluation Benchmark for Large Language Models
- Authors: Alexis Huet, Zied Ben Houidi, Dario Rossi,
- Abstract summary: We argue that integrating episodic memory capabilities into Large Language Models is essential for advancing AI towards human-like cognition.
We develop a structured approach to represent episodic events, encapsulating temporal and spatial contexts, involved entities, and detailed descriptions.
We synthesize a unique episodic memory benchmark, free from contamination, and release open source code and datasets to assess LLM performance.
- Score: 7.660368798066376
- License:
- Abstract: Episodic memory -- the ability to recall specific events grounded in time and space -- is a cornerstone of human cognition, enabling not only coherent storytelling, but also planning and decision-making. Despite their remarkable capabilities, Large Language Models (LLMs) lack a robust mechanism for episodic memory: we argue that integrating episodic memory capabilities into LLM is essential for advancing AI towards human-like cognition, increasing their potential to reason consistently and ground their output in real-world episodic events, hence avoiding confabulations. To address this challenge, we introduce a comprehensive framework to model and evaluate LLM episodic memory capabilities. Drawing inspiration from cognitive science, we develop a structured approach to represent episodic events, encapsulating temporal and spatial contexts, involved entities, and detailed descriptions. We synthesize a unique episodic memory benchmark, free from contamination, and release open source code and datasets to assess LLM performance across various recall and episodic reasoning tasks. Our evaluation of state-of-the-art models, including GPT-4 and Claude variants, Llama 3.1, and o1-mini, reveals that even the most advanced LLMs struggle with episodic memory tasks, particularly when dealing with multiple related events or complex spatio-temporal relationships -- even in contexts as short as 10k-100k tokens.
Related papers
- Event Segmentation Applications in Large Language Model Enabled Automated Recall Assessments [0.0]
Event segmentation is central to how we perceive, encode, and recall experiences.
Current research methodologies rely heavily on human for assessing segmentation patterns and recall ability.
We leverage Large Language Models (LLMs) to automate event segmentation and assess recall.
arXiv Detail & Related papers (2025-02-19T00:48:51Z) - Position: Episodic Memory is the Missing Piece for Long-Term LLM Agents [43.94686139164999]
We present an episodic memory framework for Large Language Models (LLMs) agents, centered around five key properties of episodic memory.
This position paper argues that now is the right time for an explicit, integrated focus on episodic memory to catalyze the development of long-term agents.
arXiv Detail & Related papers (2025-02-10T19:14:51Z) - InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions [104.90258030688256]
This project introduces disentangled streaming perception, reasoning, and memory mechanisms, enabling real-time interaction with streaming video and audio input.
This project simulates human-like cognition, enabling multimodal large language models to provide continuous and adaptive service over time.
arXiv Detail & Related papers (2024-12-12T18:58:30Z) - Stable Hadamard Memory: Revitalizing Memory-Augmented Agents for Reinforcement Learning [64.93848182403116]
Current deep-learning memory models struggle in reinforcement learning environments that are partially observable and long-term.
We introduce the Stable Hadamard Memory, a novel memory model for reinforcement learning agents.
Our approach significantly outperforms state-of-the-art memory-based methods on challenging partially observable benchmarks.
arXiv Detail & Related papers (2024-10-14T03:50:17Z) - Assessing Episodic Memory in LLMs with Sequence Order Recall Tasks [42.22616978679253]
We introduce Sequence Order Recall Tasks (SORT), which we adapt from tasks used to study episodic memory in cognitive psychology.
SORT requires LLMs to recall the correct order of text segments, and provides a general framework that is both easily extendable and does not require any additional annotations.
Based on a human experiment with 155 participants, we show that humans can recall sequence order based on long-term memory of a book.
arXiv Detail & Related papers (2024-10-10T17:17:38Z) - Human-like Episodic Memory for Infinite Context LLMs [13.211261438927798]
Large language models (LLMs) have shown remarkable capabilities, but still struggle with processing extensive contexts.
In this work, we introduce EM-LLM, a novel approach that integrates key aspects of human episodic memory and event cognition into LLMs.
EM-LLM organises sequences of tokens into coherent episodic events using a combination of Bayesian surprise and graph-theoretic boundary refinement.
arXiv Detail & Related papers (2024-07-12T17:34:03Z) - Hello Again! LLM-powered Personalized Agent for Long-term Dialogue [63.65128176360345]
We introduce a model-agnostic framework, the Long-term Dialogue Agent (LD-Agent)
It incorporates three independently tunable modules dedicated to event perception, persona extraction, and response generation.
The effectiveness, generality, and cross-domain capabilities of LD-Agent are empirically demonstrated.
arXiv Detail & Related papers (2024-06-09T21:58:32Z) - Empowering Working Memory for Large Language Model Agents [9.83467478231344]
This paper explores the potential of applying cognitive psychology's working memory frameworks to large language models (LLMs)
An innovative model is proposed incorporating a centralized Working Memory Hub and Episodic Buffer access to retain memories across episodes.
This architecture aims to provide greater continuity for nuanced contextual reasoning during intricate tasks and collaborative scenarios.
arXiv Detail & Related papers (2023-12-22T05:59:00Z) - Exploring Memorization in Fine-tuned Language Models [53.52403444655213]
We conduct the first comprehensive analysis to explore language models' memorization during fine-tuning across tasks.
Our studies with open-sourced and our own fine-tuned LMs across various tasks indicate that memorization presents a strong disparity among different fine-tuning tasks.
We provide an intuitive explanation of this task disparity via sparse coding theory and unveil a strong correlation between memorization and attention score distribution.
arXiv Detail & Related papers (2023-10-10T15:41:26Z) - Recursively Summarizing Enables Long-Term Dialogue Memory in Large
Language Models [75.98775135321355]
Given a long conversation, large language models (LLMs) fail to recall past information and tend to generate inconsistent responses.
We propose to generate summaries/ memory using large language models (LLMs) to enhance long-term memory ability.
arXiv Detail & Related papers (2023-08-29T04:59:53Z) - Enhancing Large Language Model with Self-Controlled Memory Framework [56.38025154501917]
Large Language Models (LLMs) are constrained by their inability to process lengthy inputs, resulting in the loss of critical historical information.
We propose the Self-Controlled Memory (SCM) framework to enhance the ability of LLMs to maintain long-term memory and recall relevant information.
arXiv Detail & Related papers (2023-04-26T07:25:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.