TAXI: Evaluating Categorical Knowledge Editing for Language Models
- URL: http://arxiv.org/abs/2404.15004v2
- Date: Thu, 6 Jun 2024 13:46:36 GMT
- Title: TAXI: Evaluating Categorical Knowledge Editing for Language Models
- Authors: Derek Powell, Walter Gerych, Thomas Hartvigsen,
- Abstract summary: Knowledge editing aims to inject new facts into language models to improve their factuality.
Current benchmarks fail to evaluate consistency, which is critical to ensure efficient, accurate, and generalizable edits.
We manually create TAXI, a new benchmark dataset specifically created to evaluate consistency in categorical knowledge edits.
- Score: 13.889284093852687
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Humans rarely learn one fact in isolation. Instead, learning a new fact induces knowledge of other facts about the world. For example, in learning a korat is a type of cat, you also infer it is a mammal and has claws, ensuring your model of the world is consistent. Knowledge editing aims to inject new facts into language models to improve their factuality, but current benchmarks fail to evaluate consistency, which is critical to ensure efficient, accurate, and generalizable edits. We manually create TAXI, a new benchmark dataset specifically created to evaluate consistency in categorical knowledge edits. TAXI contains 11,120 multiple-choice queries for 976 edits spanning 41 categories (e.g., Dogs), 164 subjects (e.g., Labrador), and 183 properties (e.g., is a mammal). We then use TAXI to evaluate popular editors' categorical consistency, measuring how often editing a subject's category appropriately edits its properties. We find that 1) the editors achieve marginal, yet non-random consistency, 2) their consistency far underperforms human baselines, and 3) consistency is more achievable when editing atypical subjects Our code and data are available at https://github.com/derekpowell/taxi.
Related papers
- AnyEdit: Edit Any Knowledge Encoded in Language Models [69.30638272162267]
We propose AnyEdit, a new autoregressive editing paradigm for large language models (LLMs)
It decomposes long-form knowledge into sequential chunks and iteratively edits the key token in each chunk, ensuring consistent and accurate outputs.
It outperforms strong baselines by 21.5% on benchmarks including UnKEBench, AKEW, and our new EditEverything dataset for long-form diverse-formatted knowledge.
arXiv Detail & Related papers (2025-02-08T16:18:37Z) - Fundamental Problems With Model Editing: How Should Rational Belief Revision Work in LLMs? [61.68363765350178]
This paper critiques the standard formulation of the model editing problem and proposes a formal testbed for model editing research.
We first describe 12 open problems with model editing, based on challenges with (1) defining the problem, (2) developing benchmarks, and (3) assuming LLMs have editable beliefs in the first place.
Next, we introduce a semi-synthetic dataset for model editing based on Wikidata, where we can evaluate edits against labels given by an idealized Bayesian agent.
arXiv Detail & Related papers (2024-06-27T17:33:03Z) - How Well Can Knowledge Edit Methods Edit Perplexing Knowledge? [18.022428746019582]
Large language models (LLMs) have demonstrated remarkable capabilities, but updating their knowledge post-training remains a critical challenge.
We introduce the concept of perplexingness'': the degree to which new knowledge conflicts with an LLM's learned conceptual hierarchies and categorical relationships.
Our analysis reveals that edits involving more abstract concepts (hypernyms) generally exhibit higher perplexingness and are more resistant to modification than their specific counterparts (hyponyms)
arXiv Detail & Related papers (2024-06-25T03:41:02Z) - Has this Fact been Edited? Detecting Knowledge Edits in Language Models [5.260519479124422]
Knowledge editing methods (KEs) can update language models' obsolete or inaccurate knowledge learned from pre-training.
Knowing whether a generated output is based on edited knowledge or first-hand knowledge from pre-training can increase users' trust in generative models.
We propose a novel task: detecting edited knowledge in language models.
arXiv Detail & Related papers (2024-05-04T22:02:24Z) - "Flex Tape Can't Fix That": Bias and Misinformation in Edited Language Models [17.77377809345631]
We investigate how model editing methods unexpectedly amplify model biases post-edit.
Specifically, we focus on biases with respect to demographic attributes such as race, geographic origin, and gender.
We find that edited models exhibit, to various degrees, more biased behavior as they become less confident in attributes for Asian, African, and South American subjects.
arXiv Detail & Related papers (2024-02-29T23:11:55Z) - Does Localization Inform Editing? Surprising Differences in
Causality-Based Localization vs. Knowledge Editing in Language Models [68.03946716358335]
We find that we can change how a fact is stored in a model by editing weights that are in a different location than where existing methods suggest that the fact is stored.
This is surprising because we would expect that localizing facts to specific model parameters would tell us where to manipulate knowledge in models.
Our results suggest, counterintuitively, that better mechanistic understanding of how pretrained language models work may not always translate to insights about how to best change their behavior.
arXiv Detail & Related papers (2023-01-10T21:26:08Z) - Instilling Type Knowledge in Language Models via Multi-Task QA [13.244420493711981]
We introduce a method to instill fine-grained type knowledge in language models with text-to-text pre-training on type-centric questions.
We create the WikiWiki dataset: entities and passages from 10M Wikipedia articles linked to the Wikidata knowledge graph with 41K types.
Models trained on WikiWiki achieve state-of-the-art performance in zero-shot dialog state tracking benchmarks, accurately infer entity types in Wikipedia articles, and can discover new types deemed useful by human judges.
arXiv Detail & Related papers (2022-04-28T22:06:32Z) - RuMedBench: A Russian Medical Language Understanding Benchmark [58.99199480170909]
The paper describes the open Russian medical language understanding benchmark covering several task types.
We prepare the unified format labeling, data split, and evaluation metrics for new tasks.
A single-number metric expresses a model's ability to cope with the benchmark.
arXiv Detail & Related papers (2022-01-17T16:23:33Z) - Editing Factual Knowledge in Language Models [51.947280241185]
We present KnowledgeEditor, a method that can be used to edit this knowledge.
Besides being computationally efficient, KnowledgeEditor does not require any modifications in LM pre-training.
We show KnowledgeEditor's efficacy with two popular architectures and knowledge-intensive tasks.
arXiv Detail & Related papers (2021-04-16T15:24:42Z) - GO FIGURE: A Meta Evaluation of Factuality in Summarization [131.1087461486504]
We introduce GO FIGURE, a meta-evaluation framework for evaluating factuality evaluation metrics.
Our benchmark analysis on ten factuality metrics reveals that our framework provides a robust and efficient evaluation.
It also reveals that while QA metrics generally improve over standard metrics that measure factuality across domains, performance is highly dependent on the way in which questions are generated.
arXiv Detail & Related papers (2020-10-24T08:30:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.