Related papers: Multimodal Large Language Models to Support Real-World Fact-Checking

Multimodal Large Language Models to Support Real-World Fact-Checking

URL: http://arxiv.org/abs/2403.03627v2
Date: Fri, 26 Apr 2024 05:16:53 GMT
Title: Multimodal Large Language Models to Support Real-World Fact-Checking
Authors: Jiahui Geng, Yova Kementchedjhieva, Preslav Nakov, Iryna Gurevych,
Abstract summary: Multimodal large language models (MLLMs) carry the potential to support humans in processing vast amounts of information. While MLLMs are already being used as a fact-checking tool, their abilities and limitations in this regard are understudied. We propose a framework for systematically assessing the capacity of current multimodal models to facilitate real-world fact-checking.
Score: 80.41047725487645
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Multimodal large language models (MLLMs) carry the potential to support humans in processing vast amounts of information. While MLLMs are already being used as a fact-checking tool, their abilities and limitations in this regard are understudied. Here is aim to bridge this gap. In particular, we propose a framework for systematically assessing the capacity of current multimodal models to facilitate real-world fact-checking. Our methodology is evidence-free, leveraging only these models' intrinsic knowledge and reasoning capabilities. By designing prompts that extract models' predictions, explanations, and confidence levels, we delve into research questions concerning model accuracy, robustness, and reasons for failure. We empirically find that (1) GPT-4V exhibits superior performance in identifying malicious and misleading multimodal claims, with the ability to explain the unreasonable aspects and underlying motives, and (2) existing open-source models exhibit strong biases and are highly sensitive to the prompt. Our study offers insights into combating false multimodal information and building secure, trustworthy multimodal models. To the best of our knowledge, we are the first to evaluate MLLMs for real-world fact-checking.

Related papers

HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context [26.506057678587176]
Insufficient context understanding can happen when a model misinterprets multimodal context, resulting in incorrect answers.<n>The shortcut problem occurs when the model overlooks crucial clues in multimodal inputs, directly addressing the query without considering the multimodal information.<n>We introduce a reasoning omni-modal benchmark, IntentBench, aimed at evaluating models in understanding complex human intentions and emotions.
arXiv Detail & Related papers (2025-06-26T14:01:03Z)
RealFactBench: A Benchmark for Evaluating Large Language Models in Real-World Fact-Checking [31.02873474960849]
We introduce RealFactBench, a benchmark designed to assess the fact-checking capabilities of Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs)<n>RealFactBench consists of 6K high-quality claims drawn from authoritative sources, encompassing multimodal content and diverse domains.<n>Our evaluation framework further introduces the Unknown Rate (UnR) metric, enabling a more nuanced assessment of models' ability to handle uncertainty.
arXiv Detail & Related papers (2025-06-14T15:27:44Z)
MLLMs are Deeply Affected by Modality Bias [158.64371871084478]
Recent advances in Multimodal Large Language Models (MLLMs) have shown promising results in integrating diverse modalities such as texts and images.<n>MLLMs are heavily influenced by modality bias, often relying on language while under-utilizing other modalities like visual inputs.<n>This paper argues that MLLMs are deeply affected by modality bias, highlighting its manifestations across various tasks.
arXiv Detail & Related papers (2025-05-24T11:49:31Z)
A Survey on Mechanistic Interpretability for Multi-Modal Foundation Models [74.48084001058672]
The rise of foundation models has transformed machine learning research. multimodal foundation models (MMFMs) pose unique interpretability challenges beyond unimodal frameworks. This survey explores two key aspects: (1) the adaptation of LLM interpretability methods to multimodal models and (2) understanding the mechanistic differences between unimodal language models and crossmodal systems.
arXiv Detail & Related papers (2025-02-22T20:55:26Z)
Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models [26.17300490736624]
Multimodal Large Language Models (MLLMs) are predominantly trained and tested on consistent visual-textual inputs. We propose the Multimodal Inconsistency Reasoning benchmark to assess MLLMs' ability to detect and reason about semantic mismatches. We evaluate six state-of-the-art MLLMs, showing that models with dedicated multimodal reasoning capabilities, such as o1, substantially outperform their counterparts.
arXiv Detail & Related papers (2025-02-22T01:52:37Z)
Protecting Privacy in Multimodal Large Language Models with MLLMU-Bench [17.73279547506514]
We introduce Multimodal Large Language Model Unlearning Benchmark (MLLMU-Bench), a novel benchmark aimed at advancing the understanding of multimodal machine unlearning. MLLMU-Bench consists of 500 fictitious profiles and 153 profiles for public celebrities, each profile feature over 14 customized question-answer pairs, evaluated from both multimodal (image+text) and unimodal (text) perspectives. Surprisingly, our experiments show that unimodal unlearning algorithms excel in generation and cloze tasks, while multimodal unlearning approaches perform better in classification tasks with multimodal inputs.
arXiv Detail & Related papers (2024-10-29T15:07:23Z)
LRQ-Fact: LLM-Generated Relevant Questions for Multimodal Fact-Checking [14.647261841209767]
We propose a fully-automated framework, LRQ-Fact, for multimodal fact-checking. It generates comprehensive questions and answers for probing multimodal content. It then evaluates both the original content and the generated questions and answers to assess the overall veracity.
arXiv Detail & Related papers (2024-10-06T20:33:22Z)
LLAVADI: What Matters For Multimodal Large Language Models Distillation [77.73964744238519]
In this work, we do not propose a new efficient model structure or train small-scale MLLMs from scratch. Our studies involve training strategies, model choices, and distillation algorithms in the knowledge distillation process. By evaluating different benchmarks and proper strategy, even a 2.7B small-scale model can perform on par with larger models with 7B or 13B parameters.
arXiv Detail & Related papers (2024-07-28T06:10:47Z)
Benchmarking Trustworthiness of Multimodal Large Language Models: A Comprehensive Study [51.19622266249408]
MultiTrust is the first comprehensive and unified benchmark on the trustworthiness of MLLMs. Our benchmark employs a rigorous evaluation strategy that addresses both multimodal risks and cross-modal impacts. Extensive experiments with 21 modern MLLMs reveal some previously unexplored trustworthiness issues and risks.
arXiv Detail & Related papers (2024-06-11T08:38:13Z)
Quantifying and Mitigating Unimodal Biases in Multimodal Large Language Models: A Causal Perspective [9.633811630889237]
We propose a causal framework to interpret the biases in Visual Question Answering (VQA) problems. We introduce a novel dataset with 12,000 challenging VQA instances requiring multi-hop reasoning. Our experiments show that MLLMs perform poorly on MORE, indicating strong unimodal biases and limited semantic understanding.
arXiv Detail & Related papers (2024-03-27T08:38:49Z)
Are Large Language Models Good Fact Checkers: A Preliminary Study [26.023148371263012]
Large Language Models (LLMs) have drawn significant attention due to their outstanding reasoning capabilities and extensive knowledge repository. This study aims to comprehensively evaluate various LLMs in tackling specific fact-checking subtasks.
arXiv Detail & Related papers (2023-11-29T05:04:52Z)
Are Large Language Models Really Robust to Word-Level Perturbations? [68.60618778027694]
We propose a novel rational evaluation approach that leverages pre-trained reward models as diagnostic tools. Longer conversations manifest the comprehensive grasp of language models in terms of their proficiency in understanding questions. Our results demonstrate that LLMs frequently exhibit vulnerability to word-level perturbations that are commonplace in daily language usage.
arXiv Detail & Related papers (2023-09-20T09:23:46Z)
A Survey on Multimodal Large Language Models [71.63375558033364]
Multimodal Large Language Model (MLLM) represented by GPT-4V has been a new rising research hotspot. This paper aims to trace and summarize the recent progress of MLLMs.
arXiv Detail & Related papers (2023-06-23T15:21:52Z)
Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP) What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining. How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.