FlashMem: Distilling Intrinsic Latent Memory via Computation Reuse
- URL: http://arxiv.org/abs/2601.05505v1
- Date: Fri, 09 Jan 2026 03:27:43 GMT
- Title: FlashMem: Distilling Intrinsic Latent Memory via Computation Reuse
- Authors: Yubo Hou, Zhisheng Chen, Tao Wan, Zengchang Qin,
- Abstract summary: FlashMem is a framework that distills intrinsic memory directly from transient reasoning states via computation reuse.<n>Experiments demonstrate that FlashMem matches the performance of heavy baselines while reducing inference latency by 5 times.
- Score: 4.210760734549566
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The stateless architecture of Large Language Models inherently lacks the mechanism to preserve dynamic context, compelling agents to redundantly reprocess history to maintain long-horizon autonomy. While latent memory offers a solution, current approaches are hindered by architectural segregation, relying on auxiliary encoders that decouple memory from the reasoning backbone. We propose FlashMem, a framework that distills intrinsic memory directly from transient reasoning states via computation reuse. Leveraging the property that internal representations uniquely encode input trajectories, FlashMem identifies the last hidden state as a sufficient statistic for the interaction history. This enables a Shared-KV Consolidator to synthesize memory by attending directly to the backbone's frozen cache, eliminating redundant re-parameterization. Furthermore, a parameter-free Cognitive Monitor leverages attention entropy to adaptively trigger consolidation only when high epistemic uncertainty is detected. Experiments demonstrate that FlashMem matches the performance of heavy baselines while reducing inference latency by 5 times, effectively bridging the gap between efficiency and persistent cognition.
Related papers
- From Verbatim to Gist: Distilling Pyramidal Multimodal Memory via Semantic Information Bottleneck for Long-Horizon Video Agents [78.30630000529133]
We propose MM-Mem, a pyramidal multimodal memory architecture grounded in Fuzzy-Trace Theory.<n> MM-Mem memory structures hierarchically into a Sensory Buffer, Episodic Stream, and Symbolic.<n>Experiments confirm the effectiveness of MM-Mem on both offline and streaming tasks.
arXiv Detail & Related papers (2026-03-02T05:12:45Z) - MemFly: On-the-Fly Memory Optimization via Information Bottleneck [35.420309099411874]
Long-term memory enables large language model agents to tackle complex tasks through historical interactions.<n>Existing frameworks encounter a dilemma between compressing redundant information efficiently and maintaining precise retrieval for downstream tasks.<n>MemFly is a framework grounded in information bottleneck principles that facilitates on-the-fly memory evolution for LLMs.<n>MemFly substantially outperforms state-of-the-art baselines in memory coherence, response fidelity, and accuracy.
arXiv Detail & Related papers (2026-02-08T09:37:25Z) - AMA: Adaptive Memory via Multi-Agent Collaboration [54.490349689939166]
We propose Adaptive Memory via Multi-Agent Collaboration (AMA), a novel framework that leverages coordinated agents to manage memory across multiple granularities.<n>AMA significantly outperforms state-of-the-art baselines while reducing token consumption by approximately 80% compared to full-context methods.
arXiv Detail & Related papers (2026-01-28T08:09:49Z) - FadeMem: Biologically-Inspired Forgetting for Efficient Agent Memory [4.608947574766633]
We propose FadeMem, a biologically-inspired agent memory architecture that incorporates active forgetting mechanisms mirroring human cognitive efficiency.<n>Experiments on Multi-Session Chat, LoCoMo, and LTI-Bench demonstrate superior multi-hop reasoning and retrieval with 45% storage reduction.
arXiv Detail & Related papers (2026-01-26T16:12:54Z) - MemRec: Collaborative Memory-Augmented Agentic Recommender System [57.548438733740504]
We propose MemRec, a framework that architecturally decouples reasoning from memory management.<n>MemRec introduces a dedicated LM_Mem to manage a dynamic collaborative memory graph.<n>It achieves state-of-the-art performance on four benchmarks.
arXiv Detail & Related papers (2026-01-13T18:51:16Z) - Membox: Weaving Topic Continuity into Long-Range Memory for LLM Agents [14.666607208502185]
We introduce membox, a hierarchical memory architecture centered on a Topic Loom.<n>Membox monitors dialogue in a sliding-window fashion, grouping consecutive same-topic turns into coherent "memory boxes" at storage time.<n>Experiments on LoCoMo demonstrate that Membox achieves up to 68% F1 improvement on temporal reasoning tasks.
arXiv Detail & Related papers (2026-01-07T10:36:29Z) - SimpleMem: Efficient Lifelong Memory for LLM Agents [73.74399447715052]
We introduce SimpleMem, an efficient memory framework based on semantic lossless compression.<n>We propose a three-stage pipeline designed to maximize information density and token utilization.<n> Experiments on benchmark datasets show that our method consistently outperforms baseline approaches in accuracy, retrieval efficiency, and inference cost.
arXiv Detail & Related papers (2026-01-05T21:02:49Z) - Agentic Learner with Grow-and-Refine Multimodal Semantic Memory [50.81667005063605]
ViLoMem is a dual-stream memory framework that constructs compact, schema-based memory.<n>It encodes visual distraction patterns and logical reasoning errors, enabling MLLMs to learn from their successful and failed experiences.
arXiv Detail & Related papers (2025-11-26T18:55:08Z) - SEDM: Scalable Self-Evolving Distributed Memory for Agents [23.182291416527764]
SEDM is a verifiable and adaptive framework that transforms memory from a passive repository into an active, self-optimizing component.<n>We show that SEDM improves reasoning accuracy while reducing token overhead compared with strong memory baselines.<n>Results highlight SEDM as a scalable and sustainable memory mechanism for open-ended multi-agent collaboration.
arXiv Detail & Related papers (2025-09-11T14:37:37Z) - Sparse-dLLM: Accelerating Diffusion LLMs with Dynamic Cache Eviction [72.27673320976933]
Diffusion Large Language Models (dLLMs) enable breakthroughs in reasoning and parallel decoding.<n>Current caching techniques accelerate decoding by storing full-layer states, yet impose substantial memory usage.<n>We propose Sparse-dLLM, the first training-free framework integrating dynamic cache eviction with sparse attention.
arXiv Detail & Related papers (2025-08-04T16:14:03Z) - MemOS: A Memory OS for AI System [116.87568350346537]
Large Language Models (LLMs) have become an essential infrastructure for Artificial General Intelligence (AGI)<n>Existing models mainly rely on static parameters and short-lived contextual states, limiting their ability to track user preferences or update knowledge over extended periods.<n>MemOS is a memory operating system that treats memory as a manageable system resource.
arXiv Detail & Related papers (2025-07-04T17:21:46Z) - Exploring Synaptic Resonance in Large Language Models: A Novel Approach to Contextual Memory Integration [0.0]
A novel mechanism, Synaptic Resonance, is introduced to dynamically reinforce relevant memory pathways during training and inference.<n> Evaluations conducted on an open-source language model demonstrate reductions in perplexity, enhancements in contextual coherence, and increased robustness against input noise.
arXiv Detail & Related papers (2025-02-15T07:06:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.