Related papers: Survey on AI-Generated Media Detection: From Non-MLLM to MLLM

Survey on AI-Generated Media Detection: From Non-MLLM to MLLM

URL: http://arxiv.org/abs/2502.05240v2
Date: Wed, 12 Feb 2025 14:43:02 GMT
Title: Survey on AI-Generated Media Detection: From Non-MLLM to MLLM
Authors: Yueying Zou, Peipei Li, Zekun Li, Huaibo Huang, Xing Cui, Xuannan Liu, Chenghanyu Zhang, Ran He,
Abstract summary: Methods for detecting AI-generated media have evolved rapidly.<n>General-purpose detectors based on MLLMs integrate authenticity verification, explainability, and localization capabilities.<n>Ethical and security considerations have emerged as critical global concerns.
Score: 51.91311158085973
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The proliferation of AI-generated media poses significant challenges to information authenticity and social trust, making reliable detection methods highly demanded. Methods for detecting AI-generated media have evolved rapidly, paralleling the advancement of Multimodal Large Language Models (MLLMs). Current detection approaches can be categorized into two main groups: Non-MLLM-based and MLLM-based methods. The former employs high-precision, domain-specific detectors powered by deep learning techniques, while the latter utilizes general-purpose detectors based on MLLMs that integrate authenticity verification, explainability, and localization capabilities. Despite significant progress in this field, there remains a gap in literature regarding a comprehensive survey that examines the transition from domain-specific to general-purpose detection methods. This paper addresses this gap by providing a systematic review of both approaches, analyzing them from single-modal and multi-modal perspectives. We present a detailed comparative analysis of these categories, examining their methodological similarities and differences. Through this analysis, we explore potential hybrid approaches and identify key challenges in forgery detection, providing direction for future research. Additionally, as MLLMs become increasingly prevalent in detection tasks, ethical and security considerations have emerged as critical global concerns. We examine the regulatory landscape surrounding Generative AI (GenAI) across various jurisdictions, offering valuable insights for researchers and practitioners in this field.

Related papers

A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems [93.8285345915925]
Reasoning is a fundamental cognitive process that enables logical inference, problem-solving, and decision-making. With the rapid advancement of large language models (LLMs), reasoning has emerged as a key capability that distinguishes advanced AI systems. We categorize existing methods along two dimensions: (1) Regimes, which define the stage at which reasoning is achieved; and (2) Architectures, which determine the components involved in the reasoning process.
arXiv Detail & Related papers (2025-04-12T01:27:49Z)
A Review Paper of the Effects of Distinct Modalities and ML Techniques to Distracted Driving Detection [3.6248657646376707]
Distracted driving remains a significant global challenge with severe human and economic repercussions.<n>This systematic review addresses critical gaps by providing a comprehensive analysis of machine learning (ML) and deep learning (DL) techniques applied across various data modalities.
arXiv Detail & Related papers (2025-01-20T21:35:34Z)
Passive Deepfake Detection Across Multi-modalities: A Comprehensive Survey [1.7811840395202345]
deepfakes (DFs) have been utilized for malicious purposes, such as individual impersonation, misinformation spreading, and artists style imitation. This survey offers researchers and practitioners a comprehensive resource for understanding the current landscape, methodological approaches, and promising future directions in this rapidly evolving field.
arXiv Detail & Related papers (2024-11-26T22:04:49Z)
Beyond Binary: Towards Fine-Grained LLM-Generated Text Detection via Role Recognition and Involvement Measurement [51.601916604301685]
Large language models (LLMs) generate content that can undermine trust in online discourse.<n>Current methods often focus on binary classification, failing to address the complexities of real-world scenarios like human-LLM collaboration.<n>To move beyond binary classification and address these challenges, we propose a new paradigm for detecting LLM-generated content.
arXiv Detail & Related papers (2024-10-18T08:14:10Z)
From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal Reasoning with Large Language Models [56.9134620424985]
Cross-modal reasoning (CMR) is increasingly recognized as a crucial capability in the progression toward more sophisticated artificial intelligence systems. The recent trend of deploying Large Language Models (LLMs) to tackle CMR tasks has marked a new mainstream of approaches for enhancing their effectiveness. This survey offers a nuanced exposition of current methodologies applied in CMR using LLMs, classifying these into a detailed three-tiered taxonomy.
arXiv Detail & Related papers (2024-09-19T02:51:54Z)
Surveying the MLLM Landscape: A Meta-Review of Current Surveys [17.372501468675303]
Multimodal Large Language Models (MLLMs) have become a transformative force in the field of artificial intelligence. This survey aims to provide a systematic review of benchmark tests and evaluation methods for MLLMs.
arXiv Detail & Related papers (2024-09-17T14:35:38Z)
Evolving from Single-modal to Multi-modal Facial Deepfake Detection: Progress and Challenges [40.11614155244292]
This survey traces the evolution of deepfake detection from early single-modal methods to sophisticated multi-modal approaches. We present a structured taxonomy of detection techniques and analyze the transition from GAN-based to diffusion model-driven deepfakes.
arXiv Detail & Related papers (2024-06-11T05:48:04Z)
Large Multimodal Agents: A Survey [78.81459893884737]
Large language models (LLMs) have achieved superior performance in powering text-based AI agents. There is an emerging research trend focused on extending these LLM-powered AI agents into the multimodal domain. This review aims to provide valuable insights and guidelines for future research in this rapidly evolving field.
arXiv Detail & Related papers (2024-02-23T06:04:23Z)
Detecting Multimedia Generated by Large AI Models: A Survey [26.84095559297626]
The aim of this survey is to fill an academic gap and contribute to global AI security efforts. We introduce a novel taxonomy for detection methods, categorized by media modality. We present a brief overview of generation mechanisms, public datasets, and online detection tools.
arXiv Detail & Related papers (2024-01-22T15:08:19Z)
A Survey on Detection of LLMs-Generated Content [97.87912800179531]
The ability to detect LLMs-generated content has become of paramount importance. We aim to provide a detailed overview of existing detection strategies and benchmarks. We also posit the necessity for a multi-faceted approach to defend against various attacks.
arXiv Detail & Related papers (2023-10-24T09:10:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.