MMKE-Bench: A Multimodal Editing Benchmark for Diverse Visual Knowledge
- URL: http://arxiv.org/abs/2502.19870v2
- Date: Sat, 01 Mar 2025 08:16:00 GMT
- Title: MMKE-Bench: A Multimodal Editing Benchmark for Diverse Visual Knowledge
- Authors: Yuntao Du, Kailin Jiang, Zhi Gao, Chenrui Shi, Zilong Zheng, Siyuan Qi, Qing Li,
- Abstract summary: We introduce MMKE-Bench, a comprehensive MultiModal Knowledge Editing Benchmark.<n>It is designed to evaluate the ability of LMMs to edit diverse visual knowledge in real-world scenarios.<n>It incorporates three types of editing tasks: visual entity editing, visual semantic editing, and user-specific editing.<n>It consists of 2,940 pieces of knowledge and 8,363 images across 33 broad categories, with evaluation questions automatically generated and human-verified.
- Score: 35.323379110573406
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Knowledge editing techniques have emerged as essential tools for updating the factual knowledge of large language models (LLMs) and multimodal models (LMMs), allowing them to correct outdated or inaccurate information without retraining from scratch. However, existing benchmarks for multimodal knowledge editing primarily focus on entity-level knowledge represented as simple triplets, which fail to capture the complexity of real-world multimodal information. To address this issue, we introduce MMKE-Bench, a comprehensive MultiModal Knowledge Editing Benchmark, designed to evaluate the ability of LMMs to edit diverse visual knowledge in real-world scenarios. MMKE-Bench addresses these limitations by incorporating three types of editing tasks: visual entity editing, visual semantic editing, and user-specific editing. Besides, MMKE-Bench uses free-form natural language to represent and edit knowledge, offering a more flexible and effective format. The benchmark consists of 2,940 pieces of knowledge and 8,363 images across 33 broad categories, with evaluation questions automatically generated and human-verified. We assess five state-of-the-art knowledge editing methods on three prominent LMMs, revealing that no method excels across all criteria, and that visual and user-specific edits are particularly challenging. MMKE-Bench sets a new standard for evaluating the robustness of multimodal knowledge editing techniques, driving progress in this rapidly evolving field.
Related papers
- AnyEdit: Edit Any Knowledge Encoded in Language Models [69.30638272162267]
We propose AnyEdit, a new autoregressive editing paradigm for large language models (LLMs)
It decomposes long-form knowledge into sequential chunks and iteratively edits the key token in each chunk, ensuring consistent and accurate outputs.
It outperforms strong baselines by 21.5% on benchmarks including UnKEBench, AKEW, and our new EditEverything dataset for long-form diverse-formatted knowledge.
arXiv Detail & Related papers (2025-02-08T16:18:37Z) - ComprehendEdit: A Comprehensive Dataset and Evaluation Framework for Multimodal Knowledge Editing [27.034072044001736]
Large multimodal language models (MLLMs) have revolutionized natural language processing and visual understanding.<n>Current knowledge editing evaluations are limited in scope and potentially biased.<n>We introduce ComprehendEdit, a comprehensive benchmark comprising eight diverse tasks from multiple datasets.
arXiv Detail & Related papers (2024-12-17T11:41:49Z) - Visual-Oriented Fine-Grained Knowledge Editing for MultiModal Large Language Models [22.26930296101678]
Existing knowledge editing works primarily focus on text-oriented, coarse-grained scenarios.
We propose a visual-oriented, fine-grained multimodal knowledge editing task that targets precise editing in images with multiple interacting entities.
arXiv Detail & Related papers (2024-11-19T14:49:36Z) - Cross-Lingual Multi-Hop Knowledge Editing [53.028586843468915]
We propose the Cross-Lingual Multi-Hop Knowledge Editing paradigm, for measuring and analyzing the performance of various SoTA knowledge editing techniques in a cross-lingual setup.<n>Specifically, we create a parallel cross-lingual benchmark, CROLIN-MQUAKE for measuring the knowledge editing capabilities.<n>Following this, we propose a significantly improved system for cross-lingual multi-hop knowledge editing, CLEVER-CKE.
arXiv Detail & Related papers (2024-07-14T17:18:16Z) - MC-MKE: A Fine-Grained Multimodal Knowledge Editing Benchmark Emphasizing Modality Consistency [50.40318712497071]
Multimodal large language models (MLLMs) are prone to non-factual or outdated knowledge issues.
We decompose multimodal knowledge into its visual and textual components.
We present MC-MKE, a fine-grained Multimodal Knowledge Editing benchmark.
arXiv Detail & Related papers (2024-06-19T05:15:21Z) - VLKEB: A Large Vision-Language Model Knowledge Editing Benchmark [53.091690659399234]
knowledge editing on large language models (LLMs) has received considerable attention.
The existing LVLM editing benchmark, which comprises three metrics (Reliability, Locality, and Generality), falls short in the quality of synthesized evaluation images.
We employ more reliable data collection methods to construct a new Large $textbfV$ision-$textbfL$anguage Model.
arXiv Detail & Related papers (2024-03-12T06:16:33Z) - MIKE: A New Benchmark for Fine-grained Multimodal Entity Knowledge
Editing [21.760293271882997]
Multimodal knowledge editing represents a critical advancement in enhancing the capabilities of Multimodal Large Language Models (MLLMs)
Current benchmarks predominantly focus on coarse-grained knowledge, leaving the intricacies of fine-grained (FG) multimodal entity knowledge largely unexplored.
To bridge this gap, we introduce MIKE, a comprehensive benchmark and dataset specifically designed for the FG multimodal entity knowledge editing.
arXiv Detail & Related papers (2024-02-18T07:15:03Z) - Generative Multi-Modal Knowledge Retrieval with Large Language Models [75.70313858231833]
We propose an innovative end-to-end generative framework for multi-modal knowledge retrieval.
Our framework takes advantage of the fact that large language models (LLMs) can effectively serve as virtual knowledge bases.
We demonstrate significant improvements ranging from 3.0% to 14.6% across all evaluation metrics when compared to strong baselines.
arXiv Detail & Related papers (2024-01-16T08:44:29Z) - A Comprehensive Study of Knowledge Editing for Large Language Models [82.65729336401027]
Large Language Models (LLMs) have shown extraordinary capabilities in understanding and generating text that closely mirrors human communication.
This paper defines the knowledge editing problem and provides a comprehensive review of cutting-edge approaches.
We introduce a new benchmark, KnowEdit, for a comprehensive empirical evaluation of representative knowledge editing approaches.
arXiv Detail & Related papers (2024-01-02T16:54:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.