Related papers: Rhea: Role-aware Heuristic Episodic Attention for Conversational LLMs

Rhea: Role-aware Heuristic Episodic Attention for Conversational LLMs

URL: http://arxiv.org/abs/2512.06869v1
Date: Sun, 07 Dec 2025 14:50:03 GMT
Title: Rhea: Role-aware Heuristic Episodic Attention for Conversational LLMs
Authors: Wanyang Hong, Zhaoning Zhang, Yi Chen, Libo Zhang, Baihui Liu, Linbo Qiao, Zhiliang Tian, Dongsheng Li,
Abstract summary: Large Language Models (LLMs) have achieved remarkable performance on single-turn tasks, yet their effectiveness deteriorates in multi-turn conversations.<n>We propose Rhea, a novel framework that decouples conversation history into two functionally independent memory modules.<n>Experiments show that Rhea mitigates performance decay and improves overall accuracy by 1.04 points on a 10-point scale.
Score: 36.91809943381492
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) have achieved remarkable performance on single-turn tasks, yet their effectiveness deteriorates in multi-turn conversations. We define this phenomenon as cumulative contextual decay - a progressive degradation of contextual integrity caused by attention pollution, dilution, and drift. To address this challenge, we propose Rhea (Role-aware Heuristic Episodic Attention), a novel framework that decouples conversation history into two functionally independent memory modules: (1) an Instructional Memory (IM) that persistently stores high-fidelity global constraints via a structural priority mechanism, and (2) an Episodic Memory (EM) that dynamically manages user-model interactions via asymmetric noise control and heuristic context retrieval. During inference, Rhea constructs a high signal-to-noise context by applying its priority attention: selectively integrating relevant episodic information while always prioritizing global instructions. To validate this approach, experiments on multiple multi-turn conversation benchmarks - including MT-Eval and Long-MT-Bench+ - show that Rhea mitigates performance decay and improves overall accuracy by 1.04 points on a 10-point scale (a 16% relative gain over strong baselines). Moreover, Rhea maintains near-perfect instruction fidelity (IAR > 8.1) across long-horizon interactions. These results demonstrate that Rhea provides a principled and effective framework for building more precise, instruction-consistent conversational LLMs.

Related papers

Amory: Building Coherent Narrative-Driven Agent Memory through Agentic Reasoning [14.368376032599437]
Amory is a working memory framework that actively constructs structured memory representations during offline time.<n>Amory organizes conversational fragments into episodic narratives, consolidates memories with momentum, and semanticizes peripheral facts into semantic memory.<n>Amory achieves considerable improvements over previous state-of-the-art, with performance comparable to full context reasoning while reducing response time by 50%.
arXiv Detail & Related papers (2026-01-09T19:51:11Z)
Memory-T1: Reinforcement Learning for Temporal Reasoning in Multi-session Agents [80.33280979339123]
We introduce Memory-T1, a framework that learns a time-aware memory selection policy using reinforcement learning (RL)<n>On the Time-Dialog benchmark, Memory-T1 boosts a 7B model to an overall score of 67.0%, establishing a new state-of-the-art performance for open-source models.
arXiv Detail & Related papers (2025-12-23T06:37:29Z)
CogMem: A Cognitive Memory Architecture for Sustained Multi-Turn Reasoning in Large Language Models [21.427373172124167]
Large language models (LLMs) excel at single-turn reasoning but often lose accuracy and coherence over extended, multi-turn interactions.<n>We introduce CogMem, a memory-augmented LLM architecture that supports sustained iterative reasoning through structured, persistent memory.<n> Experiments on TurnBench show that this layered design mitigates reasoning failures, controls context growth, and improves consistency across extended reasoning chains.
arXiv Detail & Related papers (2025-12-16T06:01:08Z)
Thinking Forward and Backward: Multi-Objective Reinforcement Learning for Retrieval-Augmented Reasoning [137.33138614095435]
Retrieval-augmented generation (RAG) has proven to be effective in mitigating hallucinations in large language models.<n>Recent efforts have incorporated search-based interactions into RAG, enabling iterative reasoning with real-time retrieval.<n>We propose Bi-RAR, a novel retrieval-augmented reasoning framework that evaluates each intermediate step jointly in both forward and backward directions.
arXiv Detail & Related papers (2025-11-12T08:29:39Z)
KnowMT-Bench: Benchmarking Knowledge-Intensive Long-Form Question Answering in Multi-Turn Dialogues [58.305425399644086]
Multi-Turn Long-Form Question Answering (MT-LFQA) is a key application paradigm of Large Language Models (LLMs) in knowledge-intensive domains.<n>We introduce textbfKnowMT-Bench, the textitfirst-ever benchmark designed to systematically evaluate MT-LFQA for LLMs across knowledge-intensive fields.
arXiv Detail & Related papers (2025-09-26T04:32:29Z)
Recent Trends in Distant Conversational Speech Recognition: A Review of CHiME-7 and 8 DASR Challenges [63.741916531380696]
The CHiME-7 and 8 distant speech recognition (DASR) challenges focus on multi-channel, generalizable, joint automatic speech recognition (ASR) and diarization of conversational speech.<n>This paper outlines the challenges' design, evaluation metrics, datasets, and baseline systems while analyzing key trends from participant submissions.
arXiv Detail & Related papers (2025-07-24T07:56:24Z)
PaceLLM: Brain-Inspired Large Language Models for Long-Context Understanding [20.849307413516183]
We propose PaceLLM, featuring two innovations: (1) a Persistent Activity (PA) Mechanism that mimics prefrontal cortex (PFC) neurons' persistent firing by introducing an activation-level memory bank to dynamically retrieve, reuse, and update critical FFN states, addressing contextual decay; and (2) Cortical Expert (CE) Clustering that emulates task-adaptive neural specialization to reorganize FFN weights into semantic modules, establishing cross-token dependencies and mitigating fragmentation.
arXiv Detail & Related papers (2025-06-18T09:17:06Z)
Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory [0.5584627289325719]
Large Language Models (LLMs) have demonstrated remarkable prowess in generating contextually coherent responses.<n>But their fixed context windows pose fundamental challenges for maintaining consistency over prolonged multi-session dialogues.<n>We introduce Mem0, a scalable memory-centric architecture that addresses this issue by dynamically extracting, consolidating, and retrieving salient information from ongoing conversations.
arXiv Detail & Related papers (2025-04-28T01:46:35Z)
In Prospect and Retrospect: Reflective Memory Management for Long-term Personalized Dialogue Agents [70.12342024019044]
Large Language Models (LLMs) have made significant progress in open-ended dialogue, yet their inability to retain and retrieve relevant information limits their effectiveness.<n>We propose Reflective Memory Management (RMM), a novel mechanism for long-term dialogue agents, integrating forward- and backward-looking reflections.<n>RMM shows more than 10% accuracy improvement over the baseline without memory management on the LongMemEval dataset.
arXiv Detail & Related papers (2025-03-11T04:15:52Z)
Decoding the Flow: CauseMotion for Emotional Causality Analysis in Long-form Conversations [22.000288488609733]
CauseMotion is a long-sequence emotional causal reasoning framework grounded in Retrieval-Augmented Generation (RAG) and multimodal fusion.<n>By integrating RAG with a sliding window mechanism, it effectively retrieves and leverages contextually relevant dialogue segments.<n>A GLM-4 integrated with CauseMotion achieves an 8.7% improvement in causal accuracy over the original model and surpasses GPT-4o by 1.2%.<n>On the publicly available DiaASQ dataset, CauseMotion-GLM-4 achieves state-of-the-art results in accuracy, F1 score, and causal reasoning accuracy.
arXiv Detail & Related papers (2025-01-01T09:10:32Z)
Echotune: A Modular Extractor Leveraging the Variable-Length Nature of Speech in ASR Tasks [4.132793413136553]
We introduce Echo-MSA, a nimble module equipped with a variable-length attention mechanism. The proposed design captures the variable length feature of speech and addresses the limitations of fixed-length attention.
arXiv Detail & Related papers (2023-09-14T14:51:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.