Related papers: MIKE: A New Benchmark for Fine-grained Multimodal Entity Knowledge Editing

MIKE: A New Benchmark for Fine-grained Multimodal Entity Knowledge Editing

URL: http://arxiv.org/abs/2402.14835v1
Date: Sun, 18 Feb 2024 07:15:03 GMT
Title: MIKE: A New Benchmark for Fine-grained Multimodal Entity Knowledge Editing
Authors: Jiaqi Li, Miaozeng Du, Chuanyi Zhang, Yongrui Chen, Nan Hu, Guilin Qi, Haiyun Jiang, Siyuan Cheng, Bozhong Tian
Abstract summary: Multimodal knowledge editing represents a critical advancement in enhancing the capabilities of Multimodal Large Language Models (MLLMs) Current benchmarks predominantly focus on coarse-grained knowledge, leaving the intricacies of fine-grained (FG) multimodal entity knowledge largely unexplored. To bridge this gap, we introduce MIKE, a comprehensive benchmark and dataset specifically designed for the FG multimodal entity knowledge editing.
Score: 21.760293271882997
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multimodal knowledge editing represents a critical advancement in enhancing the capabilities of Multimodal Large Language Models (MLLMs). Despite its potential, current benchmarks predominantly focus on coarse-grained knowledge, leaving the intricacies of fine-grained (FG) multimodal entity knowledge largely unexplored. This gap presents a notable challenge, as FG entity recognition is pivotal for the practical deployment and effectiveness of MLLMs in diverse real-world scenarios. To bridge this gap, we introduce MIKE, a comprehensive benchmark and dataset specifically designed for the FG multimodal entity knowledge editing. MIKE encompasses a suite of tasks tailored to assess different perspectives, including Vanilla Name Answering, Entity-Level Caption, and Complex-Scenario Recognition. In addition, a new form of knowledge editing, Multi-step Editing, is introduced to evaluate the editing efficiency. Through our extensive evaluations, we demonstrate that the current state-of-the-art methods face significant challenges in tackling our proposed benchmark, underscoring the complexity of FG knowledge editing in MLLMs. Our findings spotlight the urgent need for novel approaches in this domain, setting a clear agenda for future research and development efforts within the community.

Related papers

Exploring and Evaluating Multimodal Knowledge Reasoning Consistency of Multimodal Large Language Models [52.569132872560814]
multimodal large language models (MLLMs) have achieved significant breakthroughs, enhancing understanding across text and vision. However, current MLLMs still face challenges in effectively integrating knowledge across these modalities during multimodal knowledge reasoning. We analyze and compare the extent of consistency degradation in multimodal knowledge reasoning within MLLMs.
arXiv Detail & Related papers (2025-03-03T09:01:51Z)
Have We Designed Generalizable Structural Knowledge Promptings? Systematic Evaluation and Rethinking [44.66045367454493]
This paper aims to evaluate and rethink the generalization capability of the SKP paradigm from four perspectives including Granularity, Transferability, Scalability, and Universality. We introduce a novel multi-granular, multi-level benchmark called SUBARU, consisting of 9 different tasks with varying levels of granularity and difficulty.
arXiv Detail & Related papers (2024-12-31T03:20:22Z)
ComprehendEdit: A Comprehensive Dataset and Evaluation Framework for Multimodal Knowledge Editing [27.034072044001736]
Large multimodal language models (MLLMs) have revolutionized natural language processing and visual understanding. Current knowledge editing evaluations are limited in scope and potentially biased. We introduce ComprehendEdit, a comprehensive benchmark comprising eight diverse tasks from multiple datasets.
arXiv Detail & Related papers (2024-12-17T11:41:49Z)
Visual-Oriented Fine-Grained Knowledge Editing for MultiModal Large Language Models [22.26930296101678]
Existing knowledge editing works primarily focus on text-oriented, coarse-grained scenarios. We propose a visual-oriented, fine-grained multimodal knowledge editing task that targets precise editing in images with multiple interacting entities.
arXiv Detail & Related papers (2024-11-19T14:49:36Z)
RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training [55.54020926284334]
Multimodal Large Language Models (MLLMs) have recently received substantial interest, which shows their emerging potential as general-purpose models for various vision-language tasks. Retrieval augmentation techniques have proven to be effective plugins for both LLMs and MLLMs. In this study, we propose multimodal adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training (RA-BLIP), a novel retrieval-augmented framework for various MLLMs.
arXiv Detail & Related papers (2024-10-18T03:45:19Z)
Understanding the Role of LLMs in Multimodal Evaluation Benchmarks [77.59035801244278]
This paper investigates the role of the Large Language Model (LLM) backbone in Multimodal Large Language Models (MLLMs) evaluation. Our study encompasses four diverse MLLM benchmarks and eight state-of-the-art MLLMs. Key findings reveal that some benchmarks allow high performance even without visual inputs and up to 50% of error rates can be attributed to insufficient world knowledge in the LLM backbone.
arXiv Detail & Related papers (2024-10-16T07:49:13Z)
Towards Unified Multimodal Editing with Enhanced Knowledge Collaboration [107.31481207855835]
Current methods, including intrinsic knowledge editing and external knowledge resorting, each possess strengths and weaknesses. We propose UniKE, a novel multimodal editing method that establishes a unified perspective for intrinsic knowledge editing and external knowledge resorting.
arXiv Detail & Related papers (2024-09-30T02:13:53Z)
Needle In A Multimodal Haystack [79.81804334634408]
We present the first benchmark specifically designed to evaluate the capability of existing MLLMs to comprehend long multimodal documents. Our benchmark includes three types of evaluation tasks: multimodal retrieval, counting, and reasoning. We observe that existing models still have significant room for improvement on these tasks, especially on vision-centric evaluation.
arXiv Detail & Related papers (2024-06-11T13:09:16Z)
Unlocking Multi-View Insights in Knowledge-Dense Retrieval-Augmented Generation [10.431782420943764]
This paper introduces a novel multi-view RAG framework, MVRAG, tailored for knowledge-dense domains. Experiments conducted on legal and medical case retrieval demonstrate significant improvements in recall and precision rates.
arXiv Detail & Related papers (2024-04-19T13:27:38Z)
Large Multimodal Agents: A Survey [78.81459893884737]
Large language models (LLMs) have achieved superior performance in powering text-based AI agents. There is an emerging research trend focused on extending these LLM-powered AI agents into the multimodal domain. This review aims to provide valuable insights and guidelines for future research in this rapidly evolving field.
arXiv Detail & Related papers (2024-02-23T06:04:23Z)
A Comprehensive Study of Knowledge Editing for Large Language Models [82.65729336401027]
Large Language Models (LLMs) have shown extraordinary capabilities in understanding and generating text that closely mirrors human communication. This paper defines the knowledge editing problem and provides a comprehensive review of cutting-edge approaches. We introduce a new benchmark, KnowEdit, for a comprehensive empirical evaluation of representative knowledge editing approaches.
arXiv Detail & Related papers (2024-01-02T16:54:58Z)
Multi-agent Reinforcement Learning: A Comprehensive Survey [10.186029242664931]
Multi-agent systems (MAS) are widely prevalent and crucially important in numerous real-world applications. Despite their ubiquity, the development of intelligent decision-making agents in MAS poses several open challenges to their effective implementation. This survey examines these challenges, placing an emphasis on studying seminal concepts from game theory (GT) and machine learning (ML)
arXiv Detail & Related papers (2023-12-15T23:16:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.