Survey on AI-Generated Media Detection: From Non-MLLM to MLLM
- URL: http://arxiv.org/abs/2502.05240v2
- Date: Wed, 12 Feb 2025 14:43:02 GMT
- Title: Survey on AI-Generated Media Detection: From Non-MLLM to MLLM
- Authors: Yueying Zou, Peipei Li, Zekun Li, Huaibo Huang, Xing Cui, Xuannan Liu, Chenghanyu Zhang, Ran He,
- Abstract summary: Methods for detecting AI-generated media have evolved rapidly.
General-purpose detectors based on MLLMs integrate authenticity verification, explainability, and localization capabilities.
Ethical and security considerations have emerged as critical global concerns.
- Score: 51.91311158085973
- License:
- Abstract: The proliferation of AI-generated media poses significant challenges to information authenticity and social trust, making reliable detection methods highly demanded. Methods for detecting AI-generated media have evolved rapidly, paralleling the advancement of Multimodal Large Language Models (MLLMs). Current detection approaches can be categorized into two main groups: Non-MLLM-based and MLLM-based methods. The former employs high-precision, domain-specific detectors powered by deep learning techniques, while the latter utilizes general-purpose detectors based on MLLMs that integrate authenticity verification, explainability, and localization capabilities. Despite significant progress in this field, there remains a gap in literature regarding a comprehensive survey that examines the transition from domain-specific to general-purpose detection methods. This paper addresses this gap by providing a systematic review of both approaches, analyzing them from single-modal and multi-modal perspectives. We present a detailed comparative analysis of these categories, examining their methodological similarities and differences. Through this analysis, we explore potential hybrid approaches and identify key challenges in forgery detection, providing direction for future research. Additionally, as MLLMs become increasingly prevalent in detection tasks, ethical and security considerations have emerged as critical global concerns. We examine the regulatory landscape surrounding Generative AI (GenAI) across various jurisdictions, offering valuable insights for researchers and practitioners in this field.
Related papers
- A Review Paper of the Effects of Distinct Modalities and ML Techniques to Distracted Driving Detection [3.6248657646376707]
Distracted driving remains a significant global challenge with severe human and economic repercussions.
This systematic review addresses critical gaps by providing a comprehensive analysis of machine learning (ML) and deep learning (DL) techniques applied across various data modalities.
arXiv Detail & Related papers (2025-01-20T21:35:34Z) - Beyond Binary: Towards Fine-Grained LLM-Generated Text Detection via Role Recognition and Involvement Measurement [51.601916604301685]
Large language models (LLMs) generate content that can undermine trust in online discourse.
Current methods often focus on binary classification, failing to address the complexities of real-world scenarios like human-LLM collaboration.
To move beyond binary classification and address these challenges, we propose a new paradigm for detecting LLM-generated content.
arXiv Detail & Related papers (2024-10-18T08:14:10Z) - A Systematic Review of Edge Case Detection in Automated Driving: Methods, Challenges and Future Directions [0.3871780652193725]
This paper presents a practical, hierarchical review and systematic classification of edge case detection and assessment methodologies.
Our classification is structured on two levels: first, categorizing detection approaches according to AV modules, including perception-related and trajectory-related edge cases.
We introduce a new class called "knowledge-driven" approaches, which is largely overlooked in the literature.
arXiv Detail & Related papers (2024-10-11T03:32:20Z) - Insight Over Sight? Exploring the Vision-Knowledge Conflicts in Multimodal LLMs [55.74117540987519]
This paper explores the problem of commonsense-level vision-knowledge conflict in Multimodal Large Language Models (MLLMs)
We introduce an automated pipeline, augmented with human-in-the-loop quality control, to establish a benchmark aimed at simulating and assessing the conflicts in MLLMs.
We evaluate the conflict-resolution capabilities of nine representative MLLMs across various model families and find a noticeable over-reliance on textual queries.
arXiv Detail & Related papers (2024-10-10T17:31:17Z) - From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal Reasoning with Large Language Models [56.9134620424985]
Cross-modal reasoning (CMR) is increasingly recognized as a crucial capability in the progression toward more sophisticated artificial intelligence systems.
The recent trend of deploying Large Language Models (LLMs) to tackle CMR tasks has marked a new mainstream of approaches for enhancing their effectiveness.
This survey offers a nuanced exposition of current methodologies applied in CMR using LLMs, classifying these into a detailed three-tiered taxonomy.
arXiv Detail & Related papers (2024-09-19T02:51:54Z) - Surveying the MLLM Landscape: A Meta-Review of Current Surveys [17.372501468675303]
Multimodal Large Language Models (MLLMs) have become a transformative force in the field of artificial intelligence.
This survey aims to provide a systematic review of benchmark tests and evaluation methods for MLLMs.
arXiv Detail & Related papers (2024-09-17T14:35:38Z) - Large Multimodal Agents: A Survey [78.81459893884737]
Large language models (LLMs) have achieved superior performance in powering text-based AI agents.
There is an emerging research trend focused on extending these LLM-powered AI agents into the multimodal domain.
This review aims to provide valuable insights and guidelines for future research in this rapidly evolving field.
arXiv Detail & Related papers (2024-02-23T06:04:23Z) - Detecting Multimedia Generated by Large AI Models: A Survey [26.84095559297626]
The aim of this survey is to fill an academic gap and contribute to global AI security efforts.
We introduce a novel taxonomy for detection methods, categorized by media modality.
We present a brief overview of generation mechanisms, public datasets, and online detection tools.
arXiv Detail & Related papers (2024-01-22T15:08:19Z) - A Survey on Detection of LLMs-Generated Content [97.87912800179531]
The ability to detect LLMs-generated content has become of paramount importance.
We aim to provide a detailed overview of existing detection strategies and benchmarks.
We also posit the necessity for a multi-faceted approach to defend against various attacks.
arXiv Detail & Related papers (2023-10-24T09:10:26Z) - A Recent Survey of Heterogeneous Transfer Learning [15.830786437956144]
heterogeneous transfer learning has become a vital strategy in various tasks.
We offer an extensive review of over 60 HTL methods, covering both data-based and model-based approaches.
We explore applications in natural language processing, computer vision, multimodal learning, and biomedicine.
arXiv Detail & Related papers (2023-10-12T16:19:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.