Detecting Multimedia Generated by Large AI Models: A Survey
- URL: http://arxiv.org/abs/2402.00045v3
- Date: Wed, 7 Feb 2024 06:27:12 GMT
- Title: Detecting Multimedia Generated by Large AI Models: A Survey
- Authors: Li Lin, Neeraj Gupta, Yue Zhang, Hainan Ren, Chun-Hao Liu, Feng Ding,
Xin Wang, Xin Li, Luisa Verdoliva, Shu Hu
- Abstract summary: The aim of this survey is to fill an academic gap and contribute to global AI security efforts.
We introduce a novel taxonomy for detection methods, categorized by media modality.
We present a brief overview of generation mechanisms, public datasets, and online detection tools.
- Score: 26.84095559297626
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid advancement of Large AI Models (LAIMs), particularly diffusion
models and large language models, has marked a new era where AI-generated
multimedia is increasingly integrated into various aspects of daily life.
Although beneficial in numerous fields, this content presents significant
risks, including potential misuse, societal disruptions, and ethical concerns.
Consequently, detecting multimedia generated by LAIMs has become crucial, with
a marked rise in related research. Despite this, there remains a notable gap in
systematic surveys that focus specifically on detecting LAIM-generated
multimedia. Addressing this, we provide the first survey to comprehensively
cover existing research on detecting multimedia (such as text, images, videos,
audio, and multimodal content) created by LAIMs. Specifically, we introduce a
novel taxonomy for detection methods, categorized by media modality, and
aligned with two perspectives: pure detection (aiming to enhance detection
performance) and beyond detection (adding attributes like generalizability,
robustness, and interpretability to detectors). Additionally, we have presented
a brief overview of generation mechanisms, public datasets, and online
detection tools to provide a valuable resource for researchers and
practitioners in this field. Furthermore, we identify current challenges in
detection and propose directions for future research that address unexplored,
ongoing, and emerging issues in detecting multimedia generated by LAIMs. Our
aim for this survey is to fill an academic gap and contribute to global AI
security efforts, helping to ensure the integrity of information in the digital
realm. The project link is
https://github.com/Purdue-M2/Detect-LAIM-generated-Multimedia-Survey.
Related papers
- RADAR: Robust Two-stage Modality-incomplete Industrial Anomaly Detection [61.71770293720491]
We propose a novel two-stage Robust modAlity-imcomplete fusing and Detecting frAmewoRk, abbreviated as RADAR.
Our bootstrapping philosophy is to enhance two stages in MIIAD, improving the robustness of the Multimodal Transformer.
Our experimental results demonstrate that the proposed RADAR significantly surpasses conventional MIAD methods in terms of effectiveness and robustness.
arXiv Detail & Related papers (2024-10-02T16:47:55Z) - Detecting Misinformation in Multimedia Content through Cross-Modal Entity Consistency: A Dual Learning Approach [10.376378437321437]
We propose a Multimedia Misinformation Detection framework for detecting misinformation from video content by leveraging cross-modal entity consistency.
Our results demonstrate that MultiMD outperforms state-of-the-art baseline models.
arXiv Detail & Related papers (2024-08-16T16:14:36Z) - Evolving from Single-modal to Multi-modal Facial Deepfake Detection: A Survey [40.11614155244292]
As AI-generated media become more realistic, the risk of misuse to spread misinformation and commit identity fraud increases.
This work traces the evolution from traditional single-modality methods to sophisticated multi-modal approaches that handle audio-visual and text-visual scenarios.
To our knowledge, this is the first survey of its kind.
arXiv Detail & Related papers (2024-06-11T05:48:04Z) - A Comprehensive Library for Benchmarking Multi-class Visual Anomaly Detection [52.228708947607636]
This paper introduces a comprehensive visual anomaly detection benchmark, ADer, which is a modular framework for new methods.
The benchmark includes multiple datasets from industrial and medical domains, implementing fifteen state-of-the-art methods and nine comprehensive metrics.
We objectively reveal the strengths and weaknesses of different methods and provide insights into the challenges and future directions of multi-class visual anomaly detection.
arXiv Detail & Related papers (2024-06-05T13:40:07Z) - LLMs Meet Multimodal Generation and Editing: A Survey [89.76691959033323]
This survey elaborates on multimodal generation and editing across various domains, comprising image, video, 3D, and audio.
We summarize the notable advancements with milestone works in these fields and categorize these studies into LLM-based and CLIP/T5-based methods.
We dig into tool-augmented multimodal agents that can leverage existing generative models for human-computer interaction.
arXiv Detail & Related papers (2024-05-29T17:59:20Z) - Recent Advances in Hate Speech Moderation: Multimodality and the Role of Large Models [52.24001776263608]
This comprehensive survey delves into the recent strides in HS moderation.
We highlight the burgeoning role of large language models (LLMs) and large multimodal models (LMMs)
We identify existing gaps in research, particularly in the context of underrepresented languages and cultures.
arXiv Detail & Related papers (2024-01-30T03:51:44Z) - A Survey on Detection of LLMs-Generated Content [97.87912800179531]
The ability to detect LLMs-generated content has become of paramount importance.
We aim to provide a detailed overview of existing detection strategies and benchmarks.
We also posit the necessity for a multi-faceted approach to defend against various attacks.
arXiv Detail & Related papers (2023-10-24T09:10:26Z) - A Unified Survey on Anomaly, Novelty, Open-Set, and Out-of-Distribution
Detection: Solutions and Future Challenges [28.104112546546936]
Machine learning models often encounter samples that are diverged from the training distribution.
Despite having similar and shared concepts, out-of-distribution, open-set, and anomaly detection have been investigated independently.
This survey aims to provide a cross-domain and comprehensive review of numerous eminent works in respective areas.
arXiv Detail & Related papers (2021-10-26T22:05:31Z) - Families In Wild Multimedia: A Multimodal Database for Recognizing
Kinship [63.27052967981546]
We introduce the first publicly available multi-task MM kinship dataset.
To build FIW MM, we developed machinery to automatically collect, annotate, and prepare the data.
Results highlight edge cases to inspire future research with different areas of improvement.
arXiv Detail & Related papers (2020-07-28T22:36:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.