MIKE: A New Benchmark for Fine-grained Multimodal Entity Knowledge
Editing
- URL: http://arxiv.org/abs/2402.14835v1
- Date: Sun, 18 Feb 2024 07:15:03 GMT
- Title: MIKE: A New Benchmark for Fine-grained Multimodal Entity Knowledge
Editing
- Authors: Jiaqi Li, Miaozeng Du, Chuanyi Zhang, Yongrui Chen, Nan Hu, Guilin Qi,
Haiyun Jiang, Siyuan Cheng, Bozhong Tian
- Abstract summary: Multimodal knowledge editing represents a critical advancement in enhancing the capabilities of Multimodal Large Language Models (MLLMs)
Current benchmarks predominantly focus on coarse-grained knowledge, leaving the intricacies of fine-grained (FG) multimodal entity knowledge largely unexplored.
To bridge this gap, we introduce MIKE, a comprehensive benchmark and dataset specifically designed for the FG multimodal entity knowledge editing.
- Score: 21.760293271882997
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal knowledge editing represents a critical advancement in enhancing
the capabilities of Multimodal Large Language Models (MLLMs). Despite its
potential, current benchmarks predominantly focus on coarse-grained knowledge,
leaving the intricacies of fine-grained (FG) multimodal entity knowledge
largely unexplored. This gap presents a notable challenge, as FG entity
recognition is pivotal for the practical deployment and effectiveness of MLLMs
in diverse real-world scenarios. To bridge this gap, we introduce MIKE, a
comprehensive benchmark and dataset specifically designed for the FG multimodal
entity knowledge editing. MIKE encompasses a suite of tasks tailored to assess
different perspectives, including Vanilla Name Answering, Entity-Level Caption,
and Complex-Scenario Recognition. In addition, a new form of knowledge editing,
Multi-step Editing, is introduced to evaluate the editing efficiency. Through
our extensive evaluations, we demonstrate that the current state-of-the-art
methods face significant challenges in tackling our proposed benchmark,
underscoring the complexity of FG knowledge editing in MLLMs. Our findings
spotlight the urgent need for novel approaches in this domain, setting a clear
agenda for future research and development efforts within the community.
Related papers
- Visual-Oriented Fine-Grained Knowledge Editing for MultiModal Large Language Models [22.26930296101678]
Existing knowledge editing works primarily focus on text-oriented, coarse-grained scenarios.
We propose a visual-oriented, fine-grained multimodal knowledge editing task that targets precise editing in images with multiple interacting entities.
arXiv Detail & Related papers (2024-11-19T14:49:36Z) - RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training [55.54020926284334]
Multimodal Large Language Models (MLLMs) have recently received substantial interest, which shows their emerging potential as general-purpose models for various vision-language tasks.
Retrieval augmentation techniques have proven to be effective plugins for both LLMs and MLLMs.
In this study, we propose multimodal adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training (RA-BLIP), a novel retrieval-augmented framework for various MLLMs.
arXiv Detail & Related papers (2024-10-18T03:45:19Z) - Understanding the Role of LLMs in Multimodal Evaluation Benchmarks [77.59035801244278]
This paper investigates the role of the Large Language Model (LLM) backbone in Multimodal Large Language Models (MLLMs) evaluation.
Our study encompasses four diverse MLLM benchmarks and eight state-of-the-art MLLMs.
Key findings reveal that some benchmarks allow high performance even without visual inputs and up to 50% of error rates can be attributed to insufficient world knowledge in the LLM backbone.
arXiv Detail & Related papers (2024-10-16T07:49:13Z) - Towards Unified Multimodal Editing with Enhanced Knowledge Collaboration [107.31481207855835]
Current methods, including intrinsic knowledge editing and external knowledge resorting, each possess strengths and weaknesses.
We propose UniKE, a novel multimodal editing method that establishes a unified perspective for intrinsic knowledge editing and external knowledge resorting.
arXiv Detail & Related papers (2024-09-30T02:13:53Z) - Needle In A Multimodal Haystack [79.81804334634408]
We present the first benchmark specifically designed to evaluate the capability of existing MLLMs to comprehend long multimodal documents.
Our benchmark includes three types of evaluation tasks: multimodal retrieval, counting, and reasoning.
We observe that existing models still have significant room for improvement on these tasks, especially on vision-centric evaluation.
arXiv Detail & Related papers (2024-06-11T13:09:16Z) - Unlocking Multi-View Insights in Knowledge-Dense Retrieval-Augmented Generation [10.431782420943764]
This paper introduces a novel multi-view RAG framework, MVRAG, tailored for knowledge-dense domains.
Experiments conducted on legal and medical case retrieval demonstrate significant improvements in recall and precision rates.
arXiv Detail & Related papers (2024-04-19T13:27:38Z) - Large Multimodal Agents: A Survey [78.81459893884737]
Large language models (LLMs) have achieved superior performance in powering text-based AI agents.
There is an emerging research trend focused on extending these LLM-powered AI agents into the multimodal domain.
This review aims to provide valuable insights and guidelines for future research in this rapidly evolving field.
arXiv Detail & Related papers (2024-02-23T06:04:23Z) - A Comprehensive Study of Knowledge Editing for Large Language Models [82.65729336401027]
Large Language Models (LLMs) have shown extraordinary capabilities in understanding and generating text that closely mirrors human communication.
This paper defines the knowledge editing problem and provides a comprehensive review of cutting-edge approaches.
We introduce a new benchmark, KnowEdit, for a comprehensive empirical evaluation of representative knowledge editing approaches.
arXiv Detail & Related papers (2024-01-02T16:54:58Z) - Multi-agent Reinforcement Learning: A Comprehensive Survey [10.186029242664931]
Multi-agent systems (MAS) are widely prevalent and crucially important in numerous real-world applications.
Despite their ubiquity, the development of intelligent decision-making agents in MAS poses several open challenges to their effective implementation.
This survey examines these challenges, placing an emphasis on studying seminal concepts from game theory (GT) and machine learning (ML)
arXiv Detail & Related papers (2023-12-15T23:16:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.