Related papers: MADial-Bench: Towards Real-world Evaluation of Memory-Augmented Dialogue Generation

MADial-Bench: Towards Real-world Evaluation of Memory-Augmented Dialogue Generation

URL: http://arxiv.org/abs/2409.15240v2
Date: Wed, 23 Oct 2024 17:47:58 GMT
Title: MADial-Bench: Towards Real-world Evaluation of Memory-Augmented Dialogue Generation
Authors: Junqing He, Liang Zhu, Rui Wang, Xi Wang, Reza Haffari, Jiaxing Zhang,
Abstract summary: We present a novel Memory-Augmented Dialogue Benchmark (MADail-Bench) to evaluate the effectiveness of memory-augmented dialogue systems (MADS) The benchmark assesses two tasks separately: memory retrieval and memory recognition with the incorporation of both passive and proactive memory recall data. Results from cutting-edge embedding models and large language models on this benchmark indicate the potential for further advancement.
Score: 15.64077949677469
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Long-term memory is important for chatbots and dialogue systems (DS) to create consistent and human-like conversations, evidenced by numerous developed memory-augmented DS (MADS). To evaluate the effectiveness of such MADS, existing commonly used evaluation metrics, like retrieval accuracy and perplexity (PPL), mainly focus on query-oriented factualness and language quality assessment. However, these metrics often lack practical value. Moreover, the evaluation dimensions are insufficient for human-like assessment in DS. Regarding memory-recalling paradigms, current evaluation schemes only consider passive memory retrieval while ignoring diverse memory recall with rich triggering factors, e.g., emotions and surroundings, which can be essential in emotional support scenarios. To bridge the gap, we construct a novel Memory-Augmented Dialogue Benchmark (MADail-Bench) covering various memory-recalling paradigms based on cognitive science and psychology theories. The benchmark assesses two tasks separately: memory retrieval and memory recognition with the incorporation of both passive and proactive memory recall data. We introduce new scoring criteria to the evaluation, including memory injection, emotion support (ES) proficiency, and intimacy, to comprehensively assess generated responses. Results from cutting-edge embedding models and large language models on this benchmark indicate the potential for further advancement. Extensive testing further reveals correlations between memory injection, ES proficiency, and intimacy.

Related papers

FindingDory: A Benchmark to Evaluate Memory in Embodied Agents [49.89792845476579]
We introduce a new benchmark for long-range embodied tasks in the Habitat simulator.<n>This benchmark evaluates memory-based capabilities across 60 tasks requiring sustained engagement and contextual awareness.
arXiv Detail & Related papers (2025-06-18T17:06:28Z)
CAIM: Development and Evaluation of a Cognitive AI Memory Framework for Long-Term Interaction with Intelligent Agents [1.6082737760346446]
Large language models (LLMs) have advanced the field of artificial intelligence (AI) and are a powerful enabler for interactive systems.<n>They still face challenges in long-term interactions that require adaptation towards the user as well as contextual knowledge and understanding of the ever-changing environment.<n>To overcome these challenges, holistic memory modeling is required to efficiently retrieve and store relevant information across interaction sessions.<n> Cognitive AI, which aims to simulate the human thought process in a computerized model, highlights interesting aspects, such as thoughts, memory mechanisms, and decision-making.
arXiv Detail & Related papers (2025-05-19T12:33:52Z)
From Human Memory to AI Memory: A Survey on Memory Mechanisms in the Era of LLMs [34.361000444808454]
Memory is the process of encoding, storing, and retrieving information. In the era of large language models (LLMs), memory refers to the ability of an AI system to retain, recall, and use information from past interactions to improve future responses and interactions.
arXiv Detail & Related papers (2025-04-22T15:05:04Z)
In Prospect and Retrospect: Reflective Memory Management for Long-term Personalized Dialogue Agents [70.12342024019044]
Large Language Models (LLMs) have made significant progress in open-ended dialogue, yet their inability to retain and retrieve relevant information limits their effectiveness. We propose Reflective Memory Management (RMM), a novel mechanism for long-term dialogue agents, integrating forward- and backward-looking reflections. RMM shows more than 10% accuracy improvement over the baseline without memory management on the LongMemEval dataset.
arXiv Detail & Related papers (2025-03-11T04:15:52Z)
Triangulating LLM Progress through Benchmarks, Games, and Cognitive Tests [89.09172401497213]
We examine three evaluation paradigms: large question-answering benchmarks, interactive games, and cognitive tests. We compile a suite of targeted tests that measure cognitive abilities deemed essential for effective language use. Our analyses reveal that interactive games are superior to standard benchmarks in discriminating models.
arXiv Detail & Related papers (2025-02-20T08:36:58Z)
Event Segmentation Applications in Large Language Model Enabled Automated Recall Assessments [0.0]
Event segmentation is central to how we perceive, encode, and recall experiences. Current research methodologies rely heavily on human for assessing segmentation patterns and recall ability. We leverage Large Language Models (LLMs) to automate event segmentation and assess recall.
arXiv Detail & Related papers (2025-02-19T00:48:51Z)
On Memory Construction and Retrieval for Personalized Conversational Agents [69.46887405020186]
We propose SeCom, a method that constructs the memory bank at segment level by introducing a conversation segmentation model. Experimental results show that SeCom exhibits a significant performance advantage over baselines on long-term conversation benchmarks LOCOMO and Long-MT-Bench+.
arXiv Detail & Related papers (2025-02-08T14:28:36Z)
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation [39.69790911626182]
The incorporation of memory into agents is essential for numerous tasks within the domain of Reinforcement Learning (RL) The term memory'' encompasses a wide range of concepts, which, coupled with the lack of a unified methodology for validating an agent's memory, leads to erroneous judgments about agents' memory capabilities. This paper aims to streamline the concept of memory in RL by providing practical precise definitions of agent memory types.
arXiv Detail & Related papers (2024-12-09T14:34:31Z)
Stable Hadamard Memory: Revitalizing Memory-Augmented Agents for Reinforcement Learning [64.93848182403116]
Current deep-learning memory models struggle in reinforcement learning environments that are partially observable and long-term. We introduce the Stable Hadamard Memory, a novel memory model for reinforcement learning agents. Our approach significantly outperforms state-of-the-art memory-based methods on challenging partially observable benchmarks.
arXiv Detail & Related papers (2024-10-14T03:50:17Z)
Hello Again! LLM-powered Personalized Agent for Long-term Dialogue [63.65128176360345]
We introduce a model-agnostic framework, the Long-term Dialogue Agent (LD-Agent) It incorporates three independently tunable modules dedicated to event perception, persona extraction, and response generation. The effectiveness, generality, and cross-domain capabilities of LD-Agent are empirically demonstrated.
arXiv Detail & Related papers (2024-06-09T21:58:32Z)
Ever-Evolving Memory by Blending and Refining the Past [30.63352929849842]
CREEM is a novel memory system for long-term conversation. It seamlessly connects past and present information, while also possessing the ability to forget obstructive information.
arXiv Detail & Related papers (2024-03-03T08:12:59Z)
Think-in-Memory: Recalling and Post-thinking Enable LLMs with Long-Term Memory [24.464945401037056]
We propose TiM (Think-in-Memory) that enables Large Language Models to maintain an evolved memory for storing historical thoughts. We conduct qualitative and quantitative experiments on real-world and simulated dialogues covering a wide range of topics.
arXiv Detail & Related papers (2023-11-15T06:08:35Z)
A Framework for Inference Inspired by Human Memory Mechanisms [9.408704431898279]
We propose a PMI framework that consists of perception, memory and inference components. The memory module comprises working and long-term memory, with the latter endowed with a higher-order structure to retain extensive and complex relational knowledge and experience. We apply our PMI to improve prevailing Transformers and CNN models on question-answering tasks like bAbI-20k and Sort-of-CLEVR datasets.
arXiv Detail & Related papers (2023-10-01T08:12:55Z)
Memory-and-Anticipation Transformer for Online Action Understanding [52.24561192781971]
We propose a novel memory-anticipation-based paradigm to model an entire temporal structure, including the past, present, and future. We present Memory-and-Anticipation Transformer (MAT), a memory-anticipation-based approach, to address the online action detection and anticipation tasks.
arXiv Detail & Related papers (2023-08-15T17:34:54Z)
MemoryBank: Enhancing Large Language Models with Long-Term Memory [7.654404043517219]
We propose MemoryBank, a novel memory mechanism tailored for Large Language Models. MemoryBank enables the models to summon relevant memories, continually evolve through continuous memory updates, comprehend, and adapt to a user personality by synthesizing information from past interactions.
arXiv Detail & Related papers (2023-05-17T14:40:29Z)
Enhancing Large Language Model with Self-Controlled Memory Framework [56.38025154501917]
Large Language Models (LLMs) are constrained by their inability to process lengthy inputs, resulting in the loss of critical historical information. We propose the Self-Controlled Memory (SCM) framework to enhance the ability of LLMs to maintain long-term memory and recall relevant information.
arXiv Detail & Related papers (2023-04-26T07:25:31Z)
Recall, Robustness, and Lexicographic Evaluation [49.13362412522523]
The application of recall without a formal evaluative motivation has led to criticism of recall as a vague or inappropriate measure. Our analysis is composed of three tenets: recall, robustness, and lexicographic evaluation.
arXiv Detail & Related papers (2023-02-22T13:39:54Z)
Evaluating Long-Term Memory in 3D Mazes [10.224858246626171]
Memory Maze is a 3D domain of randomized mazes designed for evaluating long-term memory in agents. Unlike existing benchmarks, Memory Maze measures long-term memory separate from confounding agent abilities. We find that current algorithms benefit from training with truncated backpropagation through time and succeed on small mazes, but fall short of human performance on the large mazes.
arXiv Detail & Related papers (2022-10-24T16:32:28Z)
Learning Human Cognitive Appraisal Through Reinforcement Memory Unit [63.83306892013521]
We propose a memory-enhancing mechanism for recurrent neural networks that exploits the effect of human cognitive appraisal in sequential assessment tasks. We conceptualize the memory-enhancing mechanism as Reinforcement Memory Unit (RMU) that contains an appraisal state together with two positive and negative reinforcement memories.
arXiv Detail & Related papers (2022-08-06T08:56:55Z)
Self-Attentive Associative Memory [69.40038844695917]
We propose to separate the storage of individual experiences (item memory) and their occurring relationships (relational memory) We achieve competitive results with our proposed two-memory model in a diversity of machine learning tasks.
arXiv Detail & Related papers (2020-02-10T03:27:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.