MIKE: A New Benchmark for Fine-grained Multimodal Entity Knowledge
Editing
- URL: http://arxiv.org/abs/2402.14835v1
- Date: Sun, 18 Feb 2024 07:15:03 GMT
- Title: MIKE: A New Benchmark for Fine-grained Multimodal Entity Knowledge
Editing
- Authors: Jiaqi Li, Miaozeng Du, Chuanyi Zhang, Yongrui Chen, Nan Hu, Guilin Qi,
Haiyun Jiang, Siyuan Cheng, Bozhong Tian
- Abstract summary: Multimodal knowledge editing represents a critical advancement in enhancing the capabilities of Multimodal Large Language Models (MLLMs)
Current benchmarks predominantly focus on coarse-grained knowledge, leaving the intricacies of fine-grained (FG) multimodal entity knowledge largely unexplored.
To bridge this gap, we introduce MIKE, a comprehensive benchmark and dataset specifically designed for the FG multimodal entity knowledge editing.
- Score: 21.760293271882997
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal knowledge editing represents a critical advancement in enhancing
the capabilities of Multimodal Large Language Models (MLLMs). Despite its
potential, current benchmarks predominantly focus on coarse-grained knowledge,
leaving the intricacies of fine-grained (FG) multimodal entity knowledge
largely unexplored. This gap presents a notable challenge, as FG entity
recognition is pivotal for the practical deployment and effectiveness of MLLMs
in diverse real-world scenarios. To bridge this gap, we introduce MIKE, a
comprehensive benchmark and dataset specifically designed for the FG multimodal
entity knowledge editing. MIKE encompasses a suite of tasks tailored to assess
different perspectives, including Vanilla Name Answering, Entity-Level Caption,
and Complex-Scenario Recognition. In addition, a new form of knowledge editing,
Multi-step Editing, is introduced to evaluate the editing efficiency. Through
our extensive evaluations, we demonstrate that the current state-of-the-art
methods face significant challenges in tackling our proposed benchmark,
underscoring the complexity of FG knowledge editing in MLLMs. Our findings
spotlight the urgent need for novel approaches in this domain, setting a clear
agenda for future research and development efforts within the community.
Related papers
- Exploring the Potential of Multimodal LLM with Knowledge-Intensive Multimodal ASR [40.5451418216014]
This paper introduces the Multimodal Scientific ASR (MS-ASR) task, which focuses on transcribing scientific conference videos by leveraging visual information from slides to enhance the accuracy of technical terminologies.
We propose the Scientific Vision Augmented ASR (SciVASR) framework as a baseline method, enabling MLLMs to improve transcript quality through post-editing.
arXiv Detail & Related papers (2024-06-16T10:04:19Z) - Needle In A Multimodal Haystack [79.81804334634408]
We present the first benchmark specifically designed to evaluate the capability of existing MLLMs to comprehend long multimodal documents.
Our benchmark includes three types of evaluation tasks: multimodal retrieval, counting, and reasoning.
We observe that existing models still have significant room for improvement on these tasks, especially on vision-centric evaluation.
arXiv Detail & Related papers (2024-06-11T13:09:16Z) - Mixture of Modality Knowledge Experts for Robust Multi-modal Knowledge Graph Completion [51.80447197290866]
Multi-modal knowledge graph completion (MMKGC) aims to automatically discover new knowledge triples in the given multi-modal knowledge graphs (MMKGs)
Existing methods tend to focus on crafting elegant entity-wise multi-modal fusion strategies, yet they overlook the utilization of multi-perspective features concealed within the modalities under diverse relational contexts.
We introduce a novel MMKGC framework with Mixture of Modality Knowledge experts (MoMoK) to learn adaptive multi-modal embedding under intricate relational contexts.
arXiv Detail & Related papers (2024-05-27T06:36:17Z) - Unlocking Multi-View Insights in Knowledge-Dense Retrieval-Augmented Generation [10.431782420943764]
This paper introduces a novel multi-view RAG framework, MVRAG, tailored for knowledge-dense domains.
Experiments conducted on legal and medical case retrieval demonstrate significant improvements in recall and precision rates.
arXiv Detail & Related papers (2024-04-19T13:27:38Z) - Large Multimodal Agents: A Survey [78.81459893884737]
Large language models (LLMs) have achieved superior performance in powering text-based AI agents.
There is an emerging research trend focused on extending these LLM-powered AI agents into the multimodal domain.
This review aims to provide valuable insights and guidelines for future research in this rapidly evolving field.
arXiv Detail & Related papers (2024-02-23T06:04:23Z) - A Comprehensive Study of Knowledge Editing for Large Language Models [82.65729336401027]
Large Language Models (LLMs) have shown extraordinary capabilities in understanding and generating text that closely mirrors human communication.
This paper defines the knowledge editing problem and provides a comprehensive review of cutting-edge approaches.
We introduce a new benchmark, KnowEdit, for a comprehensive empirical evaluation of representative knowledge editing approaches.
arXiv Detail & Related papers (2024-01-02T16:54:58Z) - Multi-agent Reinforcement Learning: A Comprehensive Survey [10.186029242664931]
Multi-agent systems (MAS) are widely prevalent and crucially important in numerous real-world applications.
Despite their ubiquity, the development of intelligent decision-making agents in MAS poses several open challenges to their effective implementation.
This survey examines these challenges, placing an emphasis on studying seminal concepts from game theory (GT) and machine learning (ML)
arXiv Detail & Related papers (2023-12-15T23:16:54Z) - Large Model Based Referring Camouflaged Object Detection [51.80619142347807]
Referring camouflaged object detection (Ref-COD) is a recently-proposed problem aiming to segment out specified camouflaged objects matched with a textual or visual reference.
Our motivation is to make full use of the semantic intelligence and intrinsic knowledge of recent Multimodal Large Language Models (MLLMs) to decompose this complex task in a human-like way.
We propose a large-model-based Multi-Level Knowledge-Guided multimodal method for Ref-COD termed MLKG.
arXiv Detail & Related papers (2023-11-28T13:45:09Z) - Editing Large Language Models: Problems, Methods, and Opportunities [51.903537096207]
This paper embarks on a deep exploration of the problems, methods, and opportunities related to model editing for LLMs.
We provide an exhaustive overview of the task definition and challenges associated with model editing, along with an in-depth empirical analysis of the most progressive methods currently at our disposal.
Our objective is to provide valuable insights into the effectiveness and feasibility of each editing technique, thereby assisting the community in making informed decisions on the selection of the most appropriate method for a specific task or context.
arXiv Detail & Related papers (2023-05-22T16:00:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.