Visual-Oriented Fine-Grained Knowledge Editing for MultiModal Large Language Models
- URL: http://arxiv.org/abs/2411.12790v1
- Date: Tue, 19 Nov 2024 14:49:36 GMT
- Title: Visual-Oriented Fine-Grained Knowledge Editing for MultiModal Large Language Models
- Authors: Zhen Zeng, Leijiang Gu, Xun Yang, Zhangling Duan, Zenglin Shi, Meng Wang,
- Abstract summary: Existing knowledge editing works primarily focus on text-oriented, coarse-grained scenarios.
We propose a visual-oriented, fine-grained multimodal knowledge editing task that targets precise editing in images with multiple interacting entities.
- Score: 22.26930296101678
- License:
- Abstract: Knowledge editing aims to efficiently and cost-effectively correct inaccuracies and update outdated information. Recently, there has been growing interest in extending knowledge editing from Large Language Models (LLMs) to Multimodal Large Language Models (MLLMs), which integrate both textual and visual information, introducing additional editing complexities. Existing multimodal knowledge editing works primarily focus on text-oriented, coarse-grained scenarios, failing to address the unique challenges posed by multimodal contexts. In this paper, we propose a visual-oriented, fine-grained multimodal knowledge editing task that targets precise editing in images with multiple interacting entities. We introduce the Fine-Grained Visual Knowledge Editing (FGVEdit) benchmark to evaluate this task. Moreover, we propose a Multimodal Scope Classifier-based Knowledge Editor (MSCKE) framework. MSCKE leverages a multimodal scope classifier that integrates both visual and textual information to accurately identify and update knowledge related to specific entities within images. This approach ensures precise editing while preserving irrelevant information, overcoming the limitations of traditional text-only editing methods. Extensive experiments on the FGVEdit benchmark demonstrate that MSCKE outperforms existing methods, showcasing its effectiveness in solving the complex challenges of multimodal knowledge editing.
Related papers
- Instruction-Guided Editing Controls for Images and Multimedia: A Survey in LLM era [50.19334853510935]
Recent strides in instruction-based editing have enabled intuitive interaction with visual content, using natural language as a bridge between user intent and complex editing operations.
We aim to democratize powerful visual editing across various industries, from entertainment to education.
arXiv Detail & Related papers (2024-11-15T05:18:15Z) - Cross-Lingual Multi-Hop Knowledge Editing -- Benchmarks, Analysis and a Simple Contrastive Learning based Approach [53.028586843468915]
We propose the Cross-Lingual Multi-Hop Knowledge Editing paradigm, for measuring and analyzing the performance of various SoTA knowledge editing techniques in a cross-lingual setup.
Specifically, we create a parallel cross-lingual benchmark, CROLIN-MQUAKE for measuring the knowledge editing capabilities.
Following this, we propose a significantly improved system for cross-lingual multi-hop knowledge editing, CLEVER-CKE.
arXiv Detail & Related papers (2024-07-14T17:18:16Z) - MC-MKE: A Fine-Grained Multimodal Knowledge Editing Benchmark Emphasizing Modality Consistency [50.40318712497071]
Multimodal large language models (MLLMs) are prone to non-factual or outdated knowledge issues.
We decompose multimodal knowledge into its visual and textual components.
We present MC-MKE, a fine-grained Multimodal Knowledge Editing benchmark.
arXiv Detail & Related papers (2024-06-19T05:15:21Z) - MEMLA: Enhancing Multilingual Knowledge Editing with Neuron-Masked Low-Rank Adaptation [18.087144677674786]
We focus on multilingual knowledge editing (MKE), which requires propagating updates across multiple languages.
We introduce the Multilingual Knowledge Editing Benchmark (MKEB), a novel dataset comprising 12 languages.
We also propose a method that enhances knowledge Editing with neuron-Masked Low-Rank Adaptation (MEMLA)
arXiv Detail & Related papers (2024-06-17T14:03:50Z) - MIKE: A New Benchmark for Fine-grained Multimodal Entity Knowledge
Editing [21.760293271882997]
Multimodal knowledge editing represents a critical advancement in enhancing the capabilities of Multimodal Large Language Models (MLLMs)
Current benchmarks predominantly focus on coarse-grained knowledge, leaving the intricacies of fine-grained (FG) multimodal entity knowledge largely unexplored.
To bridge this gap, we introduce MIKE, a comprehensive benchmark and dataset specifically designed for the FG multimodal entity knowledge editing.
arXiv Detail & Related papers (2024-02-18T07:15:03Z) - A Comprehensive Study of Knowledge Editing for Large Language Models [82.65729336401027]
Large Language Models (LLMs) have shown extraordinary capabilities in understanding and generating text that closely mirrors human communication.
This paper defines the knowledge editing problem and provides a comprehensive review of cutting-edge approaches.
We introduce a new benchmark, KnowEdit, for a comprehensive empirical evaluation of representative knowledge editing approaches.
arXiv Detail & Related papers (2024-01-02T16:54:58Z) - SmartEdit: Exploring Complex Instruction-based Image Editing with
Multimodal Large Language Models [91.22477798288003]
This paper introduces SmartEdit, a novel approach to instruction-based image editing.
It exploits Multimodal Large Language Models (MLLMs) to enhance their understanding and reasoning capabilities.
We show that a small amount of complex instruction editing data can effectively stimulate SmartEdit's editing capabilities for more complex instructions.
arXiv Detail & Related papers (2023-12-11T17:54:11Z) - Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object
Detection [72.36017150922504]
We propose a multi-modal contextual knowledge distillation framework, MMC-Det, to transfer the learned contextual knowledge from a teacher fusion transformer to a student detector.
The diverse multi-modal masked language modeling is realized by an object divergence constraint upon traditional multi-modal masked language modeling (MLM)
arXiv Detail & Related papers (2023-08-30T08:33:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.