ExplicitLM: Decoupling Knowledge from Parameters via Explicit Memory Banks
- URL: http://arxiv.org/abs/2511.01581v1
- Date: Mon, 03 Nov 2025 13:53:19 GMT
- Title: ExplicitLM: Decoupling Knowledge from Parameters via Explicit Memory Banks
- Authors: Chengzhang Yu, Zening Lu, Chenyang Zheng, Chiyue Wang, Yiming Zhang, Zhanpeng Jin,
- Abstract summary: Large language models suffer from knowledge staleness and lack of interpretability due to implicit knowledge storage.<n>We propose ExplicitLM, a novel architecture featuring a million-scale external memory bank storing human-readable knowledge as token sequences.
- Score: 4.099810580680816
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models suffer from knowledge staleness and lack of interpretability due to implicit knowledge storage across entangled network parameters, preventing targeted updates and reasoning transparency. We propose ExplicitLM, a novel architecture featuring a million-scale external memory bank storing human-readable knowledge as token sequences, enabling direct inspection and modification. We design a differentiable two-stage retrieval mechanism with efficient coarse-grained filtering via product key decomposition (reducing complexity from $\mathcal{O}(N \cdot |I|)$ to $\mathcal{O}(\sqrt{N} \cdot |I|)$) and fine-grained Gumbel-Softmax matching for end-to-end training. Inspired by dual-system cognitive theory, we partition knowledge into frozen explicit facts (20%) and learnable implicit patterns (80%), maintained through Exponential Moving Average updates for stability. ExplicitLM achieves up to 43.67% improvement on knowledge-intensive tasks versus standard Transformers, with 3.62$\times$ gains in low-data regimes (10k samples). Analysis shows strong correlations between memory retrieval and performance, with correct predictions achieving 49% higher hit rates. Unlike RAG systems with frozen retrieval, our jointly optimized architecture demonstrates that interpretable, updatable models can maintain competitive performance while providing unprecedented knowledge transparency.
Related papers
- Dual Latent Memory for Visual Multi-agent System [69.29799381195592]
Visual Multi-Agent Systems (VMAS) promise to enhance comprehensive abilities through inter-agent collaboration.<n>Increasing agent turns often degrades performance while exponentially inflating token costs.<n>We propose L$2$-VMAS, a novel model-agnostic framework that enables inter-agent collaboration with dual latent memories.
arXiv Detail & Related papers (2026-01-31T02:49:10Z) - STEM: Scaling Transformers with Embedding Modules [59.26825251273227]
We introduce STEM, a static, token-indexed approach that replaces the FFN up-projection with a layer-local embedding lookup.<n>This removes runtime routing, enables CPU offload with asynchronous prefetch, and decouples capacity from both per-token FLOPs and cross-device communication.<n>Overall, STEM is an effective way of scaling parametric memory while providing better interpretability, better training stability and improved efficiency.
arXiv Detail & Related papers (2026-01-15T18:00:27Z) - CREAM: Continual Retrieval on Dynamic Streaming Corpora with Adaptive Soft Memory [19.64051996386645]
CREAM is a self-supervised framework for memory-based continual retrieval.<n>It adapts to both seen and unseen topics in an unsupervised setting.<n> Experiments on two benchmark datasets demonstrate that CREAM exhibits superior adaptability and retrieval accuracy.
arXiv Detail & Related papers (2026-01-06T04:47:49Z) - SimpleMem: Efficient Lifelong Memory for LLM Agents [73.74399447715052]
We introduce SimpleMem, an efficient memory framework based on semantic lossless compression.<n>We propose a three-stage pipeline designed to maximize information density and token utilization.<n> Experiments on benchmark datasets show that our method consistently outperforms baseline approaches in accuracy, retrieval efficiency, and inference cost.
arXiv Detail & Related papers (2026-01-05T21:02:49Z) - Reinforcement Learning Improves Traversal of Hierarchical Knowledge in LLMs [7.424730923663806]
We show that RL-enhanced models consistently outperform their base and supervised fine-tuned (SFT) counterparts on pure knowledge recall tasks.<n>We hypothesize these gains stem not from newly acquired data, but from improved procedural skills in navigating and searching existing knowledge hierarchies within the model parameters.
arXiv Detail & Related papers (2025-11-08T08:56:29Z) - Beyond Softmax: A Natural Parameterization for Categorical Random Variables [61.709831225296305]
We introduce the $textitcatnat$ function, a function composed of a sequence of hierarchical binary splits.<n>A rich set of experiments show that the proposed function improves the learning efficiency and yields models characterized by consistently higher test performance.
arXiv Detail & Related papers (2025-09-29T12:55:50Z) - TRAIL: Joint Inference and Refinement of Knowledge Graphs with Large Language Models [5.678291291711662]
TRAIL is a novel, unified framework for Thinking, Reasoning, And Incremental Learning.<n>It couples joint inference and dynamic KG refinement with large language models.<n>Extensive experiments on multiple benchmarks demonstrate that TRAIL outperforms existing KG-augmented and retrieval-augmented LLM baselines by 3% to 13%.
arXiv Detail & Related papers (2025-08-06T14:25:05Z) - PersonalAI: A Systematic Comparison of Knowledge Graph Storage and Retrieval Approaches for Personalized LLM agents [15.524189150821147]
Large language models (LLMs) combined with Retrieval-Augmented Generation (RAG) fail to scale in complex, long-term interactions.<n>We propose a flexible external memory framework based on knowledge graphs, automatically constructed and updated by the LLM itself.<n>Building upon the AriGraph architecture, we introduce a novel hybrid graph design that supports both standard edges and two types of hyperedges.<n>We evaluate our system on three benchmarks-TriviaQA, HotpotQA, and DiaASQ-demonstrating that different memory and retrieval configurations yield optimal performance depending on the task.
arXiv Detail & Related papers (2025-06-20T13:52:15Z) - KARE-RAG: Knowledge-Aware Refinement and Enhancement for RAG [63.82127103851471]
Retrieval-Augmented Generation (RAG) enables large language models to access broader knowledge sources.<n>We demonstrate that enhancing generative models' capacity to process noisy content is equally critical for robust performance.<n>We present KARE-RAG, which improves knowledge utilization through three key innovations.
arXiv Detail & Related papers (2025-06-03T06:31:17Z) - Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory [0.5584627289325719]
Large Language Models (LLMs) have demonstrated remarkable prowess in generating contextually coherent responses.<n>But their fixed context windows pose fundamental challenges for maintaining consistency over prolonged multi-session dialogues.<n>We introduce Mem0, a scalable memory-centric architecture that addresses this issue by dynamically extracting, consolidating, and retrieving salient information from ongoing conversations.
arXiv Detail & Related papers (2025-04-28T01:46:35Z) - TrustRAG: Enhancing Robustness and Trustworthiness in Retrieval-Augmented Generation [31.231916859341865]
TrustRAG is a framework that systematically filters malicious and irrelevant content before it is retrieved for generation.<n>TrustRAG delivers substantial improvements in retrieval accuracy, efficiency, and attack resistance.
arXiv Detail & Related papers (2025-01-01T15:57:34Z) - UNComp: Can Matrix Entropy Uncover Sparsity? -- A Compressor Design from an Uncertainty-Aware Perspective [85.08718140718707]
UNComp is an uncertainty-aware framework that uncovers sparsity patterns that can be used for adaptive compression.<n>By focusing on uncertainty to analyze the sparsity pattern in detail, UNComp reduces the KV cache size to 4.74% of the original, achieves a 6% prefill speedup, and improves throughput by 6.4x.
arXiv Detail & Related papers (2024-10-04T02:32:36Z) - FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models [50.331708897857574]
We introduce FactorLLM, a novel approach that decomposes well-trained dense FFNs into sparse sub-networks without requiring any further modifications.
FactorLLM achieves comparable performance to the source model securing up to 85% model performance while obtaining over a 30% increase in inference speed.
arXiv Detail & Related papers (2024-08-15T16:45:16Z) - Think-on-Graph 2.0: Deep and Faithful Large Language Model Reasoning with Knowledge-guided Retrieval Augmented Generation [14.448198170932226]
Think-on-Graph 2.0 (ToG-2) is a hybrid RAG framework that iteratively retrieves information from both unstructured and structured knowledge sources.<n>ToG-2 alternates between graph retrieval and context retrieval to search for in-depth clues relevant to the question.<n>It achieves overall state-of-the-art (SOTA) performance on 6 out of 7 knowledge-intensive datasets with GPT-3.5.
arXiv Detail & Related papers (2024-07-15T15:20:40Z) - SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios.
In the early route, intermediate outputs are consolidated via an anti-redundancy operation.
In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z) - Discovering and Reasoning of Causality in the Hidden World with Large Language Models [109.62442253177376]
We develop a new framework termed Causal representatiOn AssistanT (COAT) to propose useful measured variables for causal discovery.<n>Instead of directly inferring causality with Large language models (LLMs), COAT constructs feedback from intermediate causal discovery results to LLMs to refine the proposed variables.
arXiv Detail & Related papers (2024-02-06T12:18:54Z) - Retrieval-Enhanced Contrastive Vision-Text Models [61.783728119255365]
We propose to equip vision-text models with the ability to refine their embedding with cross-modal retrieved information from a memory at inference time.
Remarkably, we show that this can be done with a light-weight, single-layer, fusion transformer on top of a frozen CLIP.
Our experiments validate that our retrieval-enhanced contrastive (RECO) training improves CLIP performance substantially on several challenging fine-grained tasks.
arXiv Detail & Related papers (2023-06-12T15:52:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.