Detecting Multimedia Generated by Large AI Models: A Survey
- URL: http://arxiv.org/abs/2402.00045v6
- Date: Mon, 02 Jun 2025 02:47:24 GMT
- Title: Detecting Multimedia Generated by Large AI Models: A Survey
- Authors: Li Lin, Neeraj Gupta, Yue Zhang, Hainan Ren, Chun-Hao Liu, Feng Ding, Xin Wang, Xin Li, Luisa Verdoliva, Shu Hu,
- Abstract summary: The aim of this survey is to fill an academic gap and contribute to global AI security efforts.<n>We introduce a novel taxonomy for detection methods, categorized by media modality.<n>We offer a focused analysis from a social media perspective to highlight their broader societal impact.
- Score: 25.97663040910416
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid advancement of Large AI Models (LAIMs), particularly diffusion models and large language models, has marked a new era where AI-generated multimedia is increasingly integrated into various aspects of daily life. Although beneficial in numerous fields, this content presents significant risks, including potential misuse, societal disruptions, and ethical concerns. Consequently, detecting multimedia generated by LAIMs has become crucial, with a marked rise in related research. Despite this, there remains a notable gap in systematic surveys that focus specifically on detecting LAIM-generated multimedia. Addressing this, we provide the first survey to comprehensively cover existing research on detecting multimedia (such as text, images, videos, audio, and multimodal content) created by LAIMs. Specifically, we introduce a novel taxonomy for detection methods, categorized by media modality, and aligned with two perspectives: pure detection (aiming to enhance detection performance) and beyond detection (adding attributes like generalizability, robustness, and interpretability to detectors). Additionally, we have presented a brief overview of generation mechanisms, public datasets, online detection tools, and evaluation metrics to provide a valuable resource for researchers and practitioners in this field. Most importantly, we offer a focused analysis from a social media perspective to highlight their broader societal impact. Furthermore, we identify current challenges in detection and propose directions for future research that address unexplored, ongoing, and emerging issues in detecting multimedia generated by LAIMs. Our aim for this survey is to fill an academic gap and contribute to global AI security efforts, helping to ensure the integrity of information in the digital realm. The project link is https://github.com/Purdue-M2/Detect-LAIM-generated-Multimedia-Survey.
Related papers
- Anomaly Detection and Generation with Diffusion Models: A Survey [51.61574868316922]
Anomaly detection (AD) plays a pivotal role across diverse domains, including cybersecurity, finance, healthcare, and industrial manufacturing.<n>Recent advancements in deep learning, specifically diffusion models (DMs), have sparked significant interest.<n>This survey aims to guide researchers and practitioners in leveraging DMs for innovative AD solutions across diverse applications.
arXiv Detail & Related papers (2025-06-11T03:29:18Z) - Methods and Trends in Detecting Generated Images: A Comprehensive Review [0.552480439325792]
Generative Adversarial Networks (GANs), Diffusion Models, and Variational Autoencoders (VAEs) have enabled the synthesis of high-quality multimedia data.
These advancements have also raised significant concerns regarding adversarial attacks, unethical usage, and societal harm.
arXiv Detail & Related papers (2025-02-21T03:16:18Z) - Survey on AI-Generated Media Detection: From Non-MLLM to MLLM [51.91311158085973]
Methods for detecting AI-generated media have evolved rapidly.
General-purpose detectors based on MLLMs integrate authenticity verification, explainability, and localization capabilities.
Ethical and security considerations have emerged as critical global concerns.
arXiv Detail & Related papers (2025-02-07T12:18:20Z) - RADAR: Robust Two-stage Modality-incomplete Industrial Anomaly Detection [61.71770293720491]
We propose a novel two-stage Robust modAlity-imcomplete fusing and Detecting frAmewoRk, abbreviated as RADAR.
Our bootstrapping philosophy is to enhance two stages in MIIAD, improving the robustness of the Multimodal Transformer.
Our experimental results demonstrate that the proposed RADAR significantly surpasses conventional MIAD methods in terms of effectiveness and robustness.
arXiv Detail & Related papers (2024-10-02T16:47:55Z) - A Survey of Stance Detection on Social Media: New Directions and Perspectives [50.27382951812502]
stance detection has emerged as a crucial subfield within affective computing.
Recent years have seen a surge of research interest in developing effective stance detection methods.
This paper provides a comprehensive survey of stance detection techniques on social media.
arXiv Detail & Related papers (2024-09-24T03:06:25Z) - Detecting Misinformation in Multimedia Content through Cross-Modal Entity Consistency: A Dual Learning Approach [10.376378437321437]
We propose a Multimedia Misinformation Detection framework for detecting misinformation from video content by leveraging cross-modal entity consistency.
Our results demonstrate that MultiMD outperforms state-of-the-art baseline models.
arXiv Detail & Related papers (2024-08-16T16:14:36Z) - Evolving from Single-modal to Multi-modal Facial Deepfake Detection: A Survey [40.11614155244292]
As AI-generated media become more realistic, the risk of misuse to spread misinformation and commit identity fraud increases.
This work traces the evolution from traditional single-modality methods to sophisticated multi-modal approaches that handle audio-visual and text-visual scenarios.
To our knowledge, this is the first survey of its kind.
arXiv Detail & Related papers (2024-06-11T05:48:04Z) - A Comprehensive Library for Benchmarking Multi-class Visual Anomaly Detection [52.228708947607636]
This paper introduces a comprehensive visual anomaly detection benchmark, ADer, which is a modular framework for new methods.
The benchmark includes multiple datasets from industrial and medical domains, implementing fifteen state-of-the-art methods and nine comprehensive metrics.
We objectively reveal the strengths and weaknesses of different methods and provide insights into the challenges and future directions of multi-class visual anomaly detection.
arXiv Detail & Related papers (2024-06-05T13:40:07Z) - LLMs Meet Multimodal Generation and Editing: A Survey [89.76691959033323]
This survey elaborates on multimodal generation and editing across various domains, comprising image, video, 3D, and audio.
We summarize the notable advancements with milestone works in these fields and categorize these studies into LLM-based and CLIP/T5-based methods.
We dig into tool-augmented multimodal agents that can leverage existing generative models for human-computer interaction.
arXiv Detail & Related papers (2024-05-29T17:59:20Z) - Recent Advances in Hate Speech Moderation: Multimodality and the Role of Large Models [52.24001776263608]
This comprehensive survey delves into the recent strides in HS moderation.
We highlight the burgeoning role of large language models (LLMs) and large multimodal models (LMMs)
We identify existing gaps in research, particularly in the context of underrepresented languages and cultures.
arXiv Detail & Related papers (2024-01-30T03:51:44Z) - A Survey on Detection of LLMs-Generated Content [97.87912800179531]
The ability to detect LLMs-generated content has become of paramount importance.
We aim to provide a detailed overview of existing detection strategies and benchmarks.
We also posit the necessity for a multi-faceted approach to defend against various attacks.
arXiv Detail & Related papers (2023-10-24T09:10:26Z) - A Unified Survey on Anomaly, Novelty, Open-Set, and Out-of-Distribution
Detection: Solutions and Future Challenges [28.104112546546936]
Machine learning models often encounter samples that are diverged from the training distribution.
Despite having similar and shared concepts, out-of-distribution, open-set, and anomaly detection have been investigated independently.
This survey aims to provide a cross-domain and comprehensive review of numerous eminent works in respective areas.
arXiv Detail & Related papers (2021-10-26T22:05:31Z) - Families In Wild Multimedia: A Multimodal Database for Recognizing
Kinship [63.27052967981546]
We introduce the first publicly available multi-task MM kinship dataset.
To build FIW MM, we developed machinery to automatically collect, annotate, and prepare the data.
Results highlight edge cases to inspire future research with different areas of improvement.
arXiv Detail & Related papers (2020-07-28T22:36:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.