Related papers: Can ChatGPT Detect DeepFakes? A Study of Using Multimodal Large Language Models for Media Forensics

Can ChatGPT Detect DeepFakes? A Study of Using Multimodal Large Language Models for Media Forensics

URL: http://arxiv.org/abs/2403.14077v4
Date: Tue, 11 Jun 2024 16:24:45 GMT
Title: Can ChatGPT Detect DeepFakes? A Study of Using Multimodal Large Language Models for Media Forensics
Authors: Shan Jia, Reilin Lyu, Kangran Zhao, Yize Chen, Zhiyuan Yan, Yan Ju, Chuanbo Hu, Xin Li, Baoyuan Wu, Siwei Lyu,
Abstract summary: DeepFakes, which refer to AI-generated media content, have become an increasing concern due to their use as a means for disinformation. We investigate the capabilities of multimodal large language models (LLMs) in DeepFake detection.
Score: 46.99625341531352
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: DeepFakes, which refer to AI-generated media content, have become an increasing concern due to their use as a means for disinformation. Detecting DeepFakes is currently solved with programmed machine learning algorithms. In this work, we investigate the capabilities of multimodal large language models (LLMs) in DeepFake detection. We conducted qualitative and quantitative experiments to demonstrate multimodal LLMs and show that they can expose AI-generated images through careful experimental design and prompt engineering. This is interesting, considering that LLMs are not inherently tailored for media forensic tasks, and the process does not require programming. We discuss the limitations of multimodal LLMs for these tasks and suggest possible improvements.

Related papers

Can GPT tell us why these images are synthesized? Empowering Multimodal Large Language Models for Forensics [18.989883830031093]
multimodal Large Language Models (LLMs) have encoded rich world knowledge but struggle to comprehend local forgery details. We propose a framework capable of evaluating image authenticity, localizing tampered regions, providing evidence, and tracing generation methods based on semantic tampering clues. We conduct qualitative and quantitative experiments and show that GPT4V can achieve an accuracy of 92.1% in Autosplice and 86.3% in LaMa, which is competitive with state-of-the-art AIGC detection methods.
arXiv Detail & Related papers (2025-04-16T01:02:46Z)
DeepMLF: Multimodal language model with learnable tokens for deep fusion in sentiment analysis [62.31018417955254]
DeepMLF is a novel multimodal language model with learnable tokens tailored toward deep fusion. Our results confirm that deeper fusion leads to better performance, with optimal fusion depths (5-7) exceeding those of existing approaches.
arXiv Detail & Related papers (2025-04-15T11:28:02Z)
Can Multi-modal (reasoning) LLMs work as deepfake detectors? [6.36797761822772]
We benchmark 12 latest multi-modal LLMs against traditional deepfake detection methods across multiple datasets. Our findings indicate that best multi-modal LLMs achieve competitive performance with promising generalization ability with zero shot. This study highlights the potential of integrating multi-modal reasoning in future deepfake detection frameworks.
arXiv Detail & Related papers (2025-03-25T21:47:29Z)
RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training [55.54020926284334]
Multimodal Large Language Models (MLLMs) have recently received substantial interest, which shows their emerging potential as general-purpose models for various vision-language tasks. Retrieval augmentation techniques have proven to be effective plugins for both LLMs and MLLMs. In this study, we propose multimodal adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training (RA-BLIP), a novel retrieval-augmented framework for various MLLMs.
arXiv Detail & Related papers (2024-10-18T03:45:19Z)
Are you still on track!? Catching LLM Task Drift with Activations [55.75645403965326]
Task drift allows attackers to exfiltrate data or influence the LLM's output for other users. We show that a simple linear classifier can detect drift with near-perfect ROC AUC on an out-of-distribution test set. We observe that this approach generalizes surprisingly well to unseen task domains, such as prompt injections, jailbreaks, and malicious instructions.
arXiv Detail & Related papers (2024-06-02T16:53:21Z)
MMIDR: Teaching Large Language Model to Interpret Multimodal Misinformation via Knowledge Distillation [15.343028838291078]
We propose MMIDR, a framework designed to teach LLMs in providing fluent and high-quality textual explanations for their decision-making process of multimodal misinformation. To convert multimodal misinformation into an appropriate instruction-following format, we present a data augmentation perspective and pipeline. Furthermore, we design an efficient knowledge distillation approach to distill the capability of proprietary LLMs in explaining multimodal misinformation into open-source LLMs.
arXiv Detail & Related papers (2024-03-21T06:47:28Z)
Benchmarking Large Language Models for Molecule Prediction Tasks [7.067145619709089]
Large Language Models (LLMs) stand at the forefront of a number of Natural Language Processing (NLP) tasks. This paper explores a fundamental question: Can LLMs effectively handle molecule prediction tasks? We identify several classification and regression prediction tasks across six standard molecule datasets. We compare their performance with existing Machine Learning (ML) models, which include text-based models and those specifically designed for analysing the geometric structure of molecules.
arXiv Detail & Related papers (2024-03-08T05:59:56Z)
Probing Multimodal Large Language Models for Global and Local Semantic Representations [57.25949445963422]
We study which layers of Multimodal Large Language Models make the most effort to the global image information. In this study, we find that the intermediate layers of models can encode more global semantic information. We find that the topmost layers may excessively focus on local information, leading to a diminished ability to encode global information.
arXiv Detail & Related papers (2024-02-27T08:27:15Z)
Rethinking Interpretability in the Era of Large Language Models [76.1947554386879]
Large language models (LLMs) have demonstrated remarkable capabilities across a wide array of tasks. The capability to explain in natural language allows LLMs to expand the scale and complexity of patterns that can be given to a human. These new capabilities raise new challenges, such as hallucinated explanations and immense computational costs.
arXiv Detail & Related papers (2024-01-30T17:38:54Z)
Octavius: Mitigating Task Interference in MLLMs via LoRA-MoE [83.00018517368973]
Large Language Models (LLMs) can extend their zero-shot capabilities to multimodal learning through instruction tuning. negative conflicts and interference may have a worse impact on performance. We combine the well-known Mixture-of-Experts (MoE) and one of the representative PEFT techniques, i.e., LoRA, designing a novel LLM-based decoder, called LoRA-MoE, for multimodal learning.
arXiv Detail & Related papers (2023-11-05T15:48:29Z)
Language Models as Zero-Shot Trajectory Generators [10.572264780575564]
Large Language Models (LLMs) have recently shown promise as high-level planners for robots. It is often assumed that LLMs do not possess sufficient knowledge to be used for the low-level trajectories themselves. This work investigates if an LLM can directly predict a dense sequence of end-effector poses for manipulation tasks.
arXiv Detail & Related papers (2023-10-17T21:57:36Z)
DeepDecipher: Accessing and Investigating Neuron Activation in Large Language Models [2.992602379681373]
DeepDecipher is an API and interface for probing neurons in transformer models' layers. This paper outlines DeepDecipher's design and capabilities. We demonstrate how to analyze neurons, compare models, and gain insights into model behavior.
arXiv Detail & Related papers (2023-10-03T08:15:20Z)
A Survey on Multimodal Large Language Models [71.63375558033364]
Multimodal Large Language Model (MLLM) represented by GPT-4V has been a new rising research hotspot. This paper aims to trace and summarize the recent progress of MLLMs.
arXiv Detail & Related papers (2023-06-23T15:21:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.