Aeon: High-Performance Neuro-Symbolic Memory Management for Long-Horizon LLM Agents
- URL: http://arxiv.org/abs/2601.15311v1
- Date: Wed, 14 Jan 2026 15:23:22 GMT
- Title: Aeon: High-Performance Neuro-Symbolic Memory Management for Long-Horizon LLM Agents
- Authors: Mustafa Arslan,
- Abstract summary: Large Language Models (LLMs) are constrained by the quadratic computational cost of self-attention and the "Lost in the Middle" phenomenon.<n>We propose Aeon, a Neuro-Symbolic Cognitive Operating System that redefines memory not as a static store, but as a managed OS resource.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) are fundamentally constrained by the quadratic computational cost of self-attention and the "Lost in the Middle" phenomenon, where reasoning capabilities degrade as context windows expand. Existing solutions, primarily "Flat RAG" architectures relying on vector databases, treat memory as an unstructured bag of embeddings. This approach fails to capture the hierarchical and temporal structure of long-horizon interactions, leading to "Vector Haze", the retrieval of disjointed facts lacking episodic continuity. We propose Aeon, a Neuro-Symbolic Cognitive Operating System that redefines memory not as a static store, but as a managed OS resource. Aeon structures memory into a Memory Palace (a spatial index implemented via Atlas, a SIMD-accelerated Page-Clustered Vector Index that combines small-world graph navigation with B+ Tree-style disk locality to minimize read amplification) and a Trace (a neuro-symbolic episodic graph). We introduce the Semantic Lookaside Buffer (SLB), a predictive caching mechanism that exploits conversational locality to achieve sub-millisecond retrieval latencies. Benchmarks demonstrate that Aeon achieves < 1ms retrieval latency on conversational workloads while ensuring state consistency via a zero-copy C++/Python bridge, effectively enabling persistent, structured memory for autonomous agents.
Related papers
- From Verbatim to Gist: Distilling Pyramidal Multimodal Memory via Semantic Information Bottleneck for Long-Horizon Video Agents [78.30630000529133]
We propose MM-Mem, a pyramidal multimodal memory architecture grounded in Fuzzy-Trace Theory.<n> MM-Mem memory structures hierarchically into a Sensory Buffer, Episodic Stream, and Symbolic.<n>Experiments confirm the effectiveness of MM-Mem on both offline and streaming tasks.
arXiv Detail & Related papers (2026-03-02T05:12:45Z) - Hippocampus: An Efficient and Scalable Memory Module for Agentic AI [4.508092142808317]
Hippocampus is an agentic memory management system that uses compact binary signatures for semantic search.<n>Its core is a Dynamic Wavelet Matrix (DWM) that compresses and co-indexes both streams to support ultra-fast search.<n>Our evaluation shows that Hippocampus reduces end-to-end retrieval latency by up to 31$times$.
arXiv Detail & Related papers (2026-02-14T04:25:20Z) - FlashMem: Distilling Intrinsic Latent Memory via Computation Reuse [4.210760734549566]
FlashMem is a framework that distills intrinsic memory directly from transient reasoning states via computation reuse.<n>Experiments demonstrate that FlashMem matches the performance of heavy baselines while reducing inference latency by 5 times.
arXiv Detail & Related papers (2026-01-09T03:27:43Z) - Memory Matters More: Event-Centric Memory as a Logic Map for Agent Searching and Reasoning [55.251697395358285]
Large language models (LLMs) are increasingly deployed as intelligent agents that reason, plan, and interact with their environments.<n>To effectively scale to long-horizon scenarios, a key capability for such agents is a memory mechanism that can retain, organize, and retrieve past experiences.<n>We propose CompassMem, an event-centric memory framework inspired by Event Theory.
arXiv Detail & Related papers (2026-01-08T08:44:07Z) - SYNAPSE: Empowering LLM Agents with Episodic-Semantic Memory via Spreading Activation [29.545442480332515]
We introduce Synapse, a unified memory architecture that transcends static rather than pre-computed links.<n>We show that Synapse significantly outperforms state-of-the-art methods in complex temporal and multi-hop reasoning tasks.<n>Our code and data will be made publicly available upon acceptance.
arXiv Detail & Related papers (2026-01-06T06:19:58Z) - Warp-Cortex: An Asynchronous, Memory-Efficient Architecture for Million-Agent Cognitive Scaling on Consumer Hardware [0.0]
We present Warp Cortex, an asynchronous architecture that theoretically enables million-agent cognitive scaling.<n>We empirically demonstrate 100 concurrent agents at 2.2 GB total VRAM, with theoretical capacity exceeding 1,000 agents before compute latency becomes the bottleneck.<n>We further introduce Referential Injection, a non-intrusive KV-cache update mechanism that allows asynchronous sub-agents to influence primary generation without stream disruption.
arXiv Detail & Related papers (2026-01-03T23:11:21Z) - LiCoMemory: Lightweight and Cognitive Agentic Memory for Efficient Long-Term Reasoning [15.189701702660821]
LiCoMemory is an end-to-end agentic memory framework for real-time updating and retrieval.<n>CoGraph is a lightweight hierarchical graph that utilizes entities and relations as semantic indexing layers.<n>Experiments on long-term dialogue benchmarks, LoCoMo and LongMemEval, show that LiCoMemory not only outperforms established baselines in temporal reasoning, multi-session consistency, and retrieval efficiency, but also notably reduces update latency.
arXiv Detail & Related papers (2025-11-03T11:02:40Z) - Scalable Disk-Based Approximate Nearest Neighbor Search with Page-Aligned Graph [3.994346326254537]
We propose PageANN, a disk-based approximate nearest neighbor search (ANNS) framework for high performance and scalability.<n>Results show that PageANN significantly outperforms state-of-the-art (SOTA) disk-based ANNS methods, achieving 1.85x-10.83x higher throughput and 51.7%-91.9% lower latency across different datasets and memory budgets.
arXiv Detail & Related papers (2025-09-29T20:44:13Z) - Sparse-dLLM: Accelerating Diffusion LLMs with Dynamic Cache Eviction [72.27673320976933]
Diffusion Large Language Models (dLLMs) enable breakthroughs in reasoning and parallel decoding.<n>Current caching techniques accelerate decoding by storing full-layer states, yet impose substantial memory usage.<n>We propose Sparse-dLLM, the first training-free framework integrating dynamic cache eviction with sparse attention.
arXiv Detail & Related papers (2025-08-04T16:14:03Z) - Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions [55.19217798774033]
Memory is a fundamental component of AI systems, underpinning large language models (LLMs)-based agents.<n>In this survey, we first categorize memory representations into parametric and contextual forms.<n>We then introduce six fundamental memory operations: Consolidation, Updating, Indexing, Forgetting, Retrieval, and Compression.
arXiv Detail & Related papers (2025-05-01T17:31:33Z) - B'MOJO: Hybrid State Space Realizations of Foundation Models with Eidetic and Fading Memory [91.81390121042192]
We develop a class of models called B'MOJO to seamlessly combine eidetic and fading memory within an composable module.
B'MOJO's ability to modulate eidetic and fading memory results in better inference on longer sequences tested up to 32K tokens.
arXiv Detail & Related papers (2024-07-08T18:41:01Z) - Topology-aware Embedding Memory for Continual Learning on Expanding Networks [63.35819388164267]
We present a framework to tackle the memory explosion problem using memory replay techniques.
PDGNNs with Topology-aware Embedding Memory (TEM) significantly outperform state-of-the-art techniques.
arXiv Detail & Related papers (2024-01-24T03:03:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.