Related papers: Beyond Memorization: A Rigorous Evaluation Framework for Medical Knowledge Editing

Beyond Memorization: A Rigorous Evaluation Framework for Medical Knowledge Editing

URL: http://arxiv.org/abs/2506.03490v2
Date: Thu, 05 Jun 2025 03:20:15 GMT
Title: Beyond Memorization: A Rigorous Evaluation Framework for Medical Knowledge Editing
Authors: Shigeng Chen, Linhao Luo, Zhangchi Qiu, Yanan Cao, Carl Yang, Shirui Pan,
Abstract summary: knowledge editing (KE) has emerged as a promising approach to update specific facts in Large Language Models (LLMs) without the need for full retraining.<n>We propose a novel framework called MedEditBench to rigorously evaluate the effectiveness of existing KE methods in the medical domain.<n>Our findings indicate that current KE methods result in only superficial memorization of the injected information, failing to generalize to new scenarios.
Score: 72.8373875453882
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, knowledge editing (KE) has emerged as a promising approach to update specific facts in Large Language Models (LLMs) without the need for full retraining. Despite the effectiveness in general-domain benchmarks, their applicability to complex medical domain remains largely unexplored. Medical knowledge editing is particularly challenging, as it requires LLMs to internalize the knowledge and generalize to unseen scenarios for effective and interpretable decision-making. In this work, we propose a novel framework called MedEditBench to rigorously evaluate the effectiveness of existing KE methods in the medical domain. In MedEditBench, we introduce a new medical knowledge editing benchmark as well as three different knowledge editing paradigms, which are designed to assess the impact of different knowledge sources for editing. Our findings indicate that current KE methods result in only superficial memorization of the injected information, failing to generalize to new scenarios. To overcome this limitation, we present Self-Generated Rationale Editing (SGR-Edit), which utilizes model-derived rationales as the target knowledge for editing, thereby uncovering the underlying reasoning process and demonstrating significant improvements over existing KE approaches. Additionally, we offer deeper insights into medical knowledge editing, including the localization of medical knowledge in LLMs and the impact of sequential editing on evolving knowledge. This could provide practical guidance for implementing KE methods in real-world medical applications.

Related papers

MedMKEB: A Comprehensive Knowledge Editing Benchmark for Medical Multimodal Large Language Models [5.253788190589279]
We present MedMKEB, the first comprehensive benchmark designed to evaluate the reliability, generality, locality, portability, and robustness of knowledge editing.<n> MedMKEB is built on a high-quality medical visual question-answering dataset and enriched with carefully constructed editing tasks.<n>We incorporate human expert validation to ensure the accuracy and reliability of the benchmark.
arXiv Detail & Related papers (2025-08-07T07:09:26Z)
Precise Localization of Memories: A Fine-grained Neuron-level Knowledge Editing Technique for LLMs [47.06544781855325]
We propose a Fine-grained Neuron-level Knowledge Editing (FiNE) method that enhances editing locality without affecting success rates.<n>By precisely identifying and modifying specific neurons within feed-forward networks, FiNE significantly improves knowledge localization and editing.
arXiv Detail & Related papers (2025-03-03T01:30:28Z)
Fact or Guesswork? Evaluating Large Language Model's Medical Knowledge with Structured One-Hop Judgment [108.55277188617035]
Large language models (LLMs) have been widely adopted in various downstream task domains, but their ability to directly recall and apply factual medical knowledge remains under-explored.<n>Most existing medical QA benchmarks assess complex reasoning or multi-hop inference, making it difficult to isolate LLMs' inherent medical knowledge from their reasoning capabilities.<n>We introduce the Medical Knowledge Judgment, a dataset specifically designed to measure LLMs' one-hop factual medical knowledge.
arXiv Detail & Related papers (2025-02-20T05:27:51Z)
Editing the Mind of Giants: An In-Depth Exploration of Pitfalls of Knowledge Editing in Large Language Models [26.516571783335824]
Recent studies have identified side effects, such as knowledge distortion and the deterioration of general abilities, that have emerged after editing. This survey presents a comprehensive study of these side effects, providing a unified perspective on the challenges of knowledge editing in large language models.
arXiv Detail & Related papers (2024-06-03T15:28:21Z)
Editing Factual Knowledge and Explanatory Ability of Medical Large Language Models [89.13883089162951]
Model editing aims to precisely alter the behaviors of large language models (LLMs) in relation to specific knowledge. This approach has proven effective in addressing issues of hallucination and outdated information in LLMs. However, the potential of using model editing to modify knowledge in the medical field remains largely unexplored.
arXiv Detail & Related papers (2024-02-28T06:40:57Z)
EVEDIT: Event-based Knowledge Editing with Deductive Editing Boundaries [69.72012539060731]
We introduce a theoretical framework for efficient knowledge editing (KE) in large language models (LLMs) We propose a novel task of event-based knowledge editing that pairs facts with event descriptions. We empirically demonstrate the superiority of event-based editing over the existing setting on resolving uncertainty in edited models.
arXiv Detail & Related papers (2024-02-17T16:34:50Z)
A Comprehensive Study of Knowledge Editing for Large Language Models [82.65729336401027]
Large Language Models (LLMs) have shown extraordinary capabilities in understanding and generating text that closely mirrors human communication. This paper defines the knowledge editing problem and provides a comprehensive review of cutting-edge approaches. We introduce a new benchmark, KnowEdit, for a comprehensive empirical evaluation of representative knowledge editing approaches.
arXiv Detail & Related papers (2024-01-02T16:54:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.