MMFakeBench: A Mixed-Source Multimodal Misinformation Detection Benchmark for LVLMs
- URL: http://arxiv.org/abs/2406.08772v2
- Date: Wed, 21 Aug 2024 05:00:04 GMT
- Title: MMFakeBench: A Mixed-Source Multimodal Misinformation Detection Benchmark for LVLMs
- Authors: Xuannan Liu, Zekun Li, Peipei Li, Shuhan Xia, Xing Cui, Linzhi Huang, Huaibo Huang, Weihong Deng, Zhaofeng He,
- Abstract summary: multimodal misinformation detection methods often assume a single source and type of forgery for each sample.
The lack of a benchmark for mixed-source misinformation has hindered progress in this field.
We introduce MMFakeBench, the first comprehensive benchmark for mixed-source MMD.
- Score: 47.353720361676004
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current multimodal misinformation detection (MMD) methods often assume a single source and type of forgery for each sample, which is insufficient for real-world scenarios where multiple forgery sources coexist. The lack of a benchmark for mixed-source misinformation has hindered progress in this field. To address this, we introduce MMFakeBench, the first comprehensive benchmark for mixed-source MMD. MMFakeBench includes 3 critical sources: textual veracity distortion, visual veracity distortion, and cross-modal consistency distortion, along with 12 sub-categories of misinformation forgery types. We further conduct an extensive evaluation of 6 prevalent detection methods and 15 large vision-language models (LVLMs) on MMFakeBench under a zero-shot setting. The results indicate that current methods struggle under this challenging and realistic mixed-source MMD setting. Additionally, we propose an innovative unified framework, which integrates rationales, actions, and tool-use capabilities of LVLM agents, significantly enhancing accuracy and generalization. We believe this study will catalyze future research into more realistic mixed-source multimodal misinformation and provide a fair evaluation of misinformation detection methods.
Related papers
- MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models [71.36392373876505]
We introduce MMIE, a large-scale benchmark for evaluating interleaved multimodal comprehension and generation in Large Vision-Language Models (LVLMs)
MMIE comprises 20K meticulously curated multimodal queries, spanning 3 categories, 12 fields, and 102 subfields, including mathematics, coding, physics, literature, health, and arts.
It supports both interleaved inputs and outputs, offering a mix of multiple-choice and open-ended question formats to evaluate diverse competencies.
arXiv Detail & Related papers (2024-10-14T04:15:00Z) - FoRA: Low-Rank Adaptation Model beyond Multimodal Siamese Network [19.466279425330857]
We propose a novel multimodal object detector, named Low-rank Modal Adaptors (LMA) with a shared backbone.
Our work was submitted to ACM MM in April 2024, but was rejected.
arXiv Detail & Related papers (2024-07-23T02:27:52Z) - Confidence-aware multi-modality learning for eye disease screening [58.861421804458395]
We propose a novel multi-modality evidential fusion pipeline for eye disease screening.
It provides a measure of confidence for each modality and elegantly integrates the multi-modality information.
Experimental results on both public and internal datasets demonstrate that our model excels in robustness.
arXiv Detail & Related papers (2024-05-28T13:27:30Z) - Detecting and Grounding Multi-Modal Media Manipulation and Beyond [93.08116982163804]
We highlight a new research problem for multi-modal fake media, namely Detecting and Grounding Multi-Modal Media Manipulation (DGM4)
DGM4 aims to not only detect the authenticity of multi-modal media, but also ground the manipulated content.
We propose a novel HierArchical Multi-modal Manipulation rEasoning tRansformer (HAMMER) to fully capture the fine-grained interaction between different modalities.
arXiv Detail & Related papers (2023-09-25T15:05:46Z) - VERITE: A Robust Benchmark for Multimodal Misinformation Detection
Accounting for Unimodal Bias [17.107961913114778]
multimodal misinformation is a growing problem on social media platforms.
In this study, we investigate and identify the presence of unimodal bias in widely-used MMD benchmarks.
We introduce a new method -- termed Crossmodal HArd Synthetic MisAlignment (CHASMA) -- for generating realistic synthetic training data.
arXiv Detail & Related papers (2023-04-27T12:28:29Z) - Deep Metric Multi-View Hashing for Multimedia Retrieval [3.539519688102545]
We propose a novel deep metric multi-view hashing (DMMVH) method to address the mentioned problems.
On the MIR-Flickr25K, MS COCO, and NUS-WIDE, our method outperforms the current state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2023-04-13T09:25:35Z) - Generalized Product-of-Experts for Learning Multimodal Representations
in Noisy Environments [18.14974353615421]
We propose a novel method for multimodal representation learning in a noisy environment via the generalized product of experts technique.
In the proposed method, we train a separate network for each modality to assess the credibility of information coming from that modality.
We attain state-of-the-art performance on two challenging benchmarks: multimodal 3D hand-pose estimation and multimodal surgical video segmentation.
arXiv Detail & Related papers (2022-11-07T14:27:38Z) - Multimodal Information Bottleneck: Learning Minimal Sufficient Unimodal
and Multimodal Representations [27.855467591358018]
We introduce the multimodal information bottleneck (MIB), aiming to learn a powerful and sufficient multimodal representation.
We develop three MIB variants, namely, early-fusion MIB, late-fusion MIB, and complete MIB, to focus on different perspectives of information constraints.
Experimental results suggest that the proposed method reaches state-of-the-art performance on the tasks of multimodal sentiment analysis and multimodal emotion recognition.
arXiv Detail & Related papers (2022-10-31T16:14:18Z) - Trustworthy Multimodal Regression with Mixture of Normal-inverse Gamma
Distributions [91.63716984911278]
We introduce a novel Mixture of Normal-Inverse Gamma distributions (MoNIG) algorithm, which efficiently estimates uncertainty in principle for adaptive integration of different modalities and produces a trustworthy regression result.
Experimental results on both synthetic and different real-world data demonstrate the effectiveness and trustworthiness of our method on various multimodal regression tasks.
arXiv Detail & Related papers (2021-11-11T14:28:12Z) - Trusted Multi-View Classification [76.73585034192894]
We propose a novel multi-view classification method, termed trusted multi-view classification.
It provides a new paradigm for multi-view learning by dynamically integrating different views at an evidence level.
The proposed algorithm jointly utilizes multiple views to promote both classification reliability and robustness.
arXiv Detail & Related papers (2021-02-03T13:30:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.