Related papers: MuLan: A Study of Fact Mutability in Language Models

MuLan: A Study of Fact Mutability in Language Models

URL: http://arxiv.org/abs/2404.03036v1
Date: Wed, 3 Apr 2024 19:47:33 GMT
Title: MuLan: A Study of Fact Mutability in Language Models
Authors: Constanza Fierro, Nicolas Garneau, Emanuele Bugliarello, Yova Kementchedjhieva, Anders Søgaard,
Abstract summary: Trustworthy language models ideally identify mutable facts as such and process them accordingly. We create MuLan, a benchmark for evaluating the ability of English language models to anticipate time-contingency.
Score: 50.626787909759976
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Facts are subject to contingencies and can be true or false in different circumstances. One such contingency is time, wherein some facts mutate over a given period, e.g., the president of a country or the winner of a championship. Trustworthy language models ideally identify mutable facts as such and process them accordingly. We create MuLan, a benchmark for evaluating the ability of English language models to anticipate time-contingency, covering both 1:1 and 1:N relations. We hypothesize that mutable facts are encoded differently than immutable ones, hence being easier to update. In a detailed evaluation of six popular large language models, we consistently find differences in the LLMs' confidence, representations, and update behavior, depending on the mutability of a fact. Our findings should inform future work on the injection of and induction of time-contingent knowledge to/from LLMs.

Related papers

Tracing Multilingual Factual Knowledge Acquisition in Pretraining [62.95057983661562]
Large Language Models (LLMs) are capable of recalling multilingual factual knowledge present in their pretraining data.<n>We trace how factual recall and crosslingual consistency evolve during pretraining, focusing on OLMo-7B.<n>We find that both accuracy and consistency improve over time for most languages.
arXiv Detail & Related papers (2025-05-20T18:39:56Z)
Language Models' Factuality Depends on the Language of Inquiry [36.466186024957075]
We introduce a benchmark of 10,000 country-related facts across 13 languages. We propose three novel metrics: Factual Recall Score, Knowledge Transferability Score, and Cross-Lingual Factual Knowledge Transferability Score. Our results reveal fundamental weaknesses in today's state-of-the-art LMs.
arXiv Detail & Related papers (2025-02-25T08:27:18Z)
ExpliCa: Evaluating Explicit Causal Reasoning in Large Language Models [75.05436691700572]
We introduce ExpliCa, a new dataset for evaluating Large Language Models (LLMs) in explicit causal reasoning. We tested seven commercial and open-source LLMs on ExpliCa through prompting and perplexity-based metrics. Surprisingly, models tend to confound temporal relations with causal ones, and their performance is also strongly influenced by the linguistic order of the events.
arXiv Detail & Related papers (2025-02-21T14:23:14Z)
Learning and Unlearning of Fabricated Knowledge in Language Models [16.971082623826263]
We show that facts that conflict with common knowledge are remembered for tens of thousands of training steps. We show that impacts of knowledge-conflicting facts in LMs, though they can be long lasting, can be largely erased by novel application of multi-step sparse updates.
arXiv Detail & Related papers (2024-10-29T05:33:14Z)
Co-occurrence is not Factual Association in Language Models [19.708303468664088]
We show that language models are biased to learn word co-occurrence statistics instead of true factual associations. We propose two strategies to improve the learning of factual associations in language models.
arXiv Detail & Related papers (2024-09-21T08:13:16Z)
Time Awareness in Large Language Models: Benchmarking Fact Recall Across Time [0.0]
We introduce a novel dataset designed to rigorously test large language models' ability to handle time-sensitive facts. Our benchmark offers a systematic way to measure how well LLMs align their knowledge with the correct time context.
arXiv Detail & Related papers (2024-09-20T08:57:20Z)
Fine-tuning Language Models for Factuality [96.5203774943198]
Large pre-trained language models (LLMs) have led to their widespread use, sometimes even as a replacement for traditional search engines. Yet language models are prone to making convincing but factually inaccurate claims, often referred to as 'hallucinations' In this work, we fine-tune language models to be more factual, without human labeling.
arXiv Detail & Related papers (2023-11-14T18:59:15Z)
Cross-Lingual Consistency of Factual Knowledge in Multilingual Language Models [2.6626950367610402]
We study the cross-lingual consistency (CLC) of factual knowledge in various multilingual PLMs. We propose a Ranking-based Consistency (RankC) metric to evaluate knowledge consistency across languages independently from accuracy.
arXiv Detail & Related papers (2023-10-16T13:19:17Z)
Do Large Language Models Know about Facts? [60.501902866946]
Large language models (LLMs) have recently driven striking performance improvements across a range of natural language processing tasks. We aim to evaluate the extent and scope of factual knowledge within LLMs by designing the benchmark Pinocchio. Pinocchio contains 20K diverse factual questions that span different sources, timelines, domains, regions, and languages.
arXiv Detail & Related papers (2023-10-08T14:26:55Z)
Mitigating Temporal Misalignment by Discarding Outdated Facts [58.620269228776294]
Large language models are often used under temporal misalignment, tasked with answering questions about the present. We propose fact duration prediction: the task of predicting how long a given fact will remain true. Our data and code are released publicly at https://github.com/mikejqzhang/mitigating_misalignment.
arXiv Detail & Related papers (2023-05-24T07:30:08Z)
Can LMs Learn New Entities from Descriptions? Challenges in Propagating Injected Knowledge [72.63368052592004]
We study LMs' abilities to make inferences based on injected facts (or propagate those facts) We find that existing methods for updating knowledge show little propagation of injected knowledge. Yet, prepending entity definitions in an LM's context improves performance across all settings.
arXiv Detail & Related papers (2023-05-02T17:59:46Z)
Multi-timescale Representation Learning in LSTM Language Models [69.98840820213937]
Language models must capture statistical dependencies between words at timescales ranging from very short to very long. We derived a theory for how the memory gating mechanism in long short-term memory language models can capture power law decay. Experiments showed that LSTM language models trained on natural English text learn to approximate this theoretical distribution.
arXiv Detail & Related papers (2020-09-27T02:13:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.