Related papers: NanoKnow: How to Know What Your Language Model Knows

NanoKnow: How to Know What Your Language Model Knows

URL: http://arxiv.org/abs/2602.20122v1
Date: Mon, 23 Feb 2026 18:37:49 GMT
Title: NanoKnow: How to Know What Your Language Model Knows
Authors: Lingwei Gu, Nour Jedidi, Jimmy Lin,
Abstract summary: We release NanoKnow, a benchmark dataset that partitions questions from Natural Questions and SQuAD into splits.<n>Using these splits, we can now properly disentangle the sources of knowledge that LLMs rely on when producing an output.<n>Our findings show that closed-book accuracy is strongly influenced by answer frequency in the pre-training data.
Score: 44.07087580987766
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: How do large language models (LLMs) know what they know? Answering this question has been difficult because pre-training data is often a "black box" -- unknown or inaccessible. The recent release of nanochat -- a family of small LLMs with fully open pre-training data -- addresses this as it provides a transparent view into where a model's parametric knowledge comes from. Towards the goal of understanding how knowledge is encoded by LLMs, we release NanoKnow, a benchmark dataset that partitions questions from Natural Questions and SQuAD into splits based on whether their answers are present in nanochat's pre-training corpus. Using these splits, we can now properly disentangle the sources of knowledge that LLMs rely on when producing an output. To demonstrate NanoKnow's utility, we conduct experiments using eight nanochat checkpoints. Our findings show: (1) closed-book accuracy is strongly influenced by answer frequency in the pre-training data, (2) providing external evidence can mitigate this frequency dependence, (3) even with external evidence, models are more accurate when answers were seen during pre-training, demonstrating that parametric and external knowledge are complementary, and (4) non-relevant information is harmful, with accuracy decreasing based on both the position and the number of non-relevant contexts. We release all NanoKnow artifacts at https://github.com/castorini/NanoKnow.

Related papers

ReFactX: Scalable Reasoning with Reliable Facts via Constrained Generation [8.331415420334361]
We present a scalable method that enables Large Language Models to access external knowledge without depending on retrievers or auxiliary models.<n>Our approach uses constrained generation with a pre-built prefix-tree index.<n>We evaluate our proposal on Question Answering and show that it scales to large knowledge bases (800 million facts), adapts to domain-specific data, and achieves effective results.
arXiv Detail & Related papers (2025-08-23T10:21:47Z)
Prompting is not Enough: Exploring Knowledge Integration and Controllable Generation [89.65955788873532]
Open-domain question answering (OpenQA) represents a cornerstone in natural language processing (NLP)<n>We propose a novel framework named GenKI, which aims to improve the OpenQA performance by exploring Knowledge Integration and controllable Generation.
arXiv Detail & Related papers (2025-05-26T08:18:33Z)
How new data permeates LLM knowledge and how to dilute it [19.96863816288517]
Large language models learn and continually learn through the accumulation of gradient-based updates.<n>We demonstrate that when learning new information, LLMs exhibit a "priming" effect: learning a new fact can cause the model to inappropriately apply that knowledge in unrelated contexts.<n>We show that the degree of priming after learning new information can be predicted by measuring the token probability of key words before learning.
arXiv Detail & Related papers (2025-04-13T11:25:04Z)
Inside-Out: Hidden Factual Knowledge in LLMs [50.79758420289131]
This work presents a framework for assessing whether large language models (LLMs) encode more factual knowledge in their parameters than what they express in their outputs.<n>We first propose a formal definition of knowledge, quantifying it for a given question as the fraction of correct-incorrect answer pairs where the correct one is ranked higher.<n>We then present a case study, applying this framework to three popular open-weights LLMs in a closed-book QA setup.
arXiv Detail & Related papers (2025-03-19T15:21:48Z)
Are LLMs Really Not Knowledgable? Mining the Submerged Knowledge in LLMs' Memory [15.986679553468989]
Large language models (LLMs) have shown promise as potential knowledge bases.<n>LLMs often struggle with question-answering tasks and are prone to hallucinations.<n>We develop SkipUnsure, a method to improve answer accuracy by leveraging detected but unexpressed knowledge.
arXiv Detail & Related papers (2024-12-30T10:29:18Z)
Robust and Scalable Model Editing for Large Language Models [75.95623066605259]
We propose EREN (Edit models by REading Notes) to improve the scalability and robustness of LLM editing. Unlike existing techniques, it can integrate knowledge from multiple edits, and correctly respond to syntactically similar but semantically unrelated inputs.
arXiv Detail & Related papers (2024-03-26T06:57:23Z)
R-Tuning: Instructing Large Language Models to Say `I Don't Know' [66.11375475253007]
Large language models (LLMs) have revolutionized numerous domains with their impressive performance but still face their challenges. Previous instruction tuning methods force the model to complete a sentence no matter whether the model knows the knowledge or not. We present a new approach called Refusal-Aware Instruction Tuning (R-Tuning) Experimental results demonstrate R-Tuning effectively improves a model's ability to answer known questions and refrain from answering unknown questions.
arXiv Detail & Related papers (2023-11-16T08:45:44Z)
Physics of Language Models: Part 3.1, Knowledge Storage and Extraction [51.68385617116854]
Large language models (LLMs) can store a vast amount of world knowledge, often extractable via question-answering. We find a strong correlation between the model's ability to extract knowledge and various diversity measures of the training data.
arXiv Detail & Related papers (2023-09-25T17:37:20Z)
Knowledge-Augmented Language Model Prompting for Zero-Shot Knowledge Graph Question Answering [7.888547093390469]
Large Language Models (LLMs) are capable of performing zero-shot closed-book question answering tasks. We propose to augment the knowledge directly in the input of LLMs. Our framework, Knowledge-Augmented language model PromptING (KAPING), requires no model training, thus completely zero-shot.
arXiv Detail & Related papers (2023-06-07T04:15:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.