As Good As A Coin Toss: Human detection of AI-generated images, videos, audio, and audiovisual stimuli
- URL: http://arxiv.org/abs/2403.16760v3
- Date: Thu, 4 Apr 2024 14:51:56 GMT
- Title: As Good As A Coin Toss: Human detection of AI-generated images, videos, audio, and audiovisual stimuli
- Authors: Di Cooke, Abigail Edwards, Sophia Barkoff, Kathryn Kelly,
- Abstract summary: The principal defense against being misled by synthetic media relies on the ability of the human observer to visually and auditorily discern between real and fake.
We conducted a perceptual study with 1276 participants to assess how accurate people were at distinguishing synthetic images, audio only, video only, and audiovisual stimuli from authentic.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: As synthetic media becomes progressively more realistic and barriers to using it continue to lower, the technology has been increasingly utilized for malicious purposes, from financial fraud to nonconsensual pornography. Today, the principal defense against being misled by synthetic media relies on the ability of the human observer to visually and auditorily discern between real and fake. However, it remains unclear just how vulnerable people actually are to deceptive synthetic media in the course of their day to day lives. We conducted a perceptual study with 1276 participants to assess how accurate people were at distinguishing synthetic images, audio only, video only, and audiovisual stimuli from authentic. To reflect the circumstances under which people would likely encounter synthetic media in the wild, testing conditions and stimuli emulated a typical online platform, while all synthetic media used in the survey was sourced from publicly accessible generative AI technology. We find that overall, participants struggled to meaningfully discern between synthetic and authentic content. We also find that detection performance worsens when the stimuli contains synthetic content as compared to authentic content, images featuring human faces as compared to non face objects, a single modality as compared to multimodal stimuli, mixed authenticity as compared to being fully synthetic for audiovisual stimuli, and features foreign languages as compared to languages the observer is fluent in. Finally, we also find that prior knowledge of synthetic media does not meaningfully impact their detection performance. Collectively, these results indicate that people are highly susceptible to being tricked by synthetic media in their daily lives and that human perceptual detection capabilities can no longer be relied upon as an effective counterdefense.
Related papers
- Measuring the Robustness of Audio Deepfake Detectors [59.09338266364506]
This work systematically evaluates the robustness of 10 audio deepfake detection models against 16 common corruptions.
Using both traditional deep learning models and state-of-the-art foundation models, we make four unique observations.
arXiv Detail & Related papers (2025-03-21T23:21:17Z) - Steganography Beyond Space-Time with Chain of Multimodal AI [8.095373104009868]
Steganography is the art and science of covert writing.
As artificial intelligence continues to evolve, its ability to synthesise realistic content emerges as a threat in the hands of cybercriminals.
This study proposes a paradigm in steganography for audiovisual media, where messages are concealed beyond both spatial and temporal domains.
arXiv Detail & Related papers (2025-02-25T15:56:09Z) - Adult learners recall and recognition performance and affective feedback when learning from an AI-generated synthetic video [1.7742433461734404]
The current study recruited 500 participants to investigate adult learners recall and recognition performances as well as their affective feedback on the AI-generated synthetic video.
The results indicated no statistically significant difference amongst conditions on recall and recognition performance.
However, adult learners preferred to learn from the video formats rather than text materials.
arXiv Detail & Related papers (2024-11-28T21:40:28Z) - Unmasking Illusions: Understanding Human Perception of Audiovisual Deepfakes [49.81915942821647]
This paper aims to evaluate the human ability to discern deepfake videos through a subjective study.
We present our findings by comparing human observers to five state-ofthe-art audiovisual deepfake detection models.
We found that all AI models performed better than humans when evaluated on the same 40 videos.
arXiv Detail & Related papers (2024-05-07T07:57:15Z) - A Comparative Study of Perceptual Quality Metrics for Audio-driven
Talking Head Videos [81.54357891748087]
We collect talking head videos generated from four generative methods.
We conduct controlled psychophysical experiments on visual quality, lip-audio synchronization, and head movement naturalness.
Our experiments validate consistency between model predictions and human annotations, identifying metrics that align better with human opinions than widely-used measures.
arXiv Detail & Related papers (2024-03-11T04:13:38Z) - A Representative Study on Human Detection of Artificially Generated
Media Across Countries [28.99277150719848]
State-of-the-art forgeries are almost indistinguishable from "real" media.
The majority of participants simply guessing when asked to rate them as human- or machine-generated.
In addition, AI-generated media receive is voted more human like across all media types and all countries.
arXiv Detail & Related papers (2023-12-10T19:34:52Z) - Learning Human Action Recognition Representations Without Real Humans [66.61527869763819]
We present a benchmark that leverages real-world videos with humans removed and synthetic data containing virtual humans to pre-train a model.
We then evaluate the transferability of the representation learned on this data to a diverse set of downstream action recognition benchmarks.
Our approach outperforms previous baselines by up to 5%.
arXiv Detail & Related papers (2023-11-10T18:38:14Z) - Training Robust Deep Physiological Measurement Models with Synthetic
Video-based Data [11.31971398273479]
We propose measures to add real-world noise to synthetic physiological signals and corresponding facial videos.
Our results show that we were able to reduce the average MAE from 6.9 to 2.0.
arXiv Detail & Related papers (2023-11-09T13:55:45Z) - The Age of Synthetic Realities: Challenges and Opportunities [85.058932103181]
We highlight the crucial need for the development of forensic techniques capable of identifying harmful synthetic creations and distinguishing them from reality.
Our focus extends to various forms of media, such as images, videos, audio, and text, as we examine how synthetic realities are crafted and explore approaches to detecting these malicious creations.
This study is of paramount importance due to the rapid progress of AI generative techniques and their impact on the fundamental principles of Forensic Science.
arXiv Detail & Related papers (2023-06-09T15:55:10Z) - Seeing is not always believing: Benchmarking Human and Model Perception
of AI-Generated Images [66.20578637253831]
There is a growing concern that the advancement of artificial intelligence (AI) technology may produce fake photos.
This study aims to comprehensively evaluate agents for distinguishing state-of-the-art AI-generated visual content.
arXiv Detail & Related papers (2023-04-25T17:51:59Z) - Fighting Malicious Media Data: A Survey on Tampering Detection and
Deepfake Detection [115.83992775004043]
Recent advances in deep learning, particularly deep generative models, open the doors for producing perceptually convincing images and videos at a low cost.
This paper provides a comprehensive review of the current media tampering detection approaches, and discusses the challenges and trends in this field for future research.
arXiv Detail & Related papers (2022-12-12T02:54:08Z) - Deep Learning and Synthetic Media [0.0]
I argue that "deepfakes" and related synthetic media produced with such pipelines do not merely offer incremental improvements over previous methods.
I argue that "deepfakes" and related synthetic media produced with such pipelines pave the way for genuinely novel kinds of audiovisual media.
arXiv Detail & Related papers (2022-05-11T20:28:09Z) - Data-driven emotional body language generation for social robotics [58.88028813371423]
In social robotics, endowing humanoid robots with the ability to generate bodily expressions of affect can improve human-robot interaction and collaboration.
We implement a deep learning data-driven framework that learns from a few hand-designed robotic bodily expressions.
The evaluation study found that the anthropomorphism and animacy of the generated expressions are not perceived differently from the hand-designed ones.
arXiv Detail & Related papers (2022-05-02T09:21:39Z) - Audio-Visual Person-of-Interest DeepFake Detection [77.04789677645682]
The aim of this work is to propose a deepfake detector that can cope with the wide variety of manipulation methods and scenarios encountered in the real world.
We leverage a contrastive learning paradigm to learn the moving-face and audio segment embeddings that are most discriminative for each identity.
Our method can detect both single-modality (audio-only, video-only) and multi-modality (audio-video) attacks, and is robust to low-quality or corrupted videos.
arXiv Detail & Related papers (2022-04-06T20:51:40Z) - Watch Those Words: Video Falsification Detection Using Word-Conditioned
Facial Motion [82.06128362686445]
We propose a multi-modal semantic forensic approach to handle both cheapfakes and visually persuasive deepfakes.
We leverage the idea of attribution to learn person-specific biometric patterns that distinguish a given speaker from others.
Unlike existing person-specific approaches, our method is also effective against attacks that focus on lip manipulation.
arXiv Detail & Related papers (2021-12-21T01:57:04Z) - SynFace: Face Recognition with Synthetic Data [83.15838126703719]
We devise the SynFace with identity mixup (IM) and domain mixup (DM) to mitigate the performance gap.
We also perform a systematically empirical analysis on synthetic face images to provide some insights on how to effectively utilize synthetic data for face recognition.
arXiv Detail & Related papers (2021-08-18T03:41:54Z) - More Real than Real: A Study on Human Visual Perception of Synthetic
Faces [7.25613186882905]
We describe a perceptual experiment where volunteers have been exposed to synthetic face images produced by state-of-the-art Generative Adversarial Networks.
Experiment outcomes reveal how strongly we should call into question our human ability to discriminate real faces from synthetic ones generated through modern AI.
arXiv Detail & Related papers (2021-06-14T08:27:25Z) - Are GAN generated images easy to detect? A critical analysis of the
state-of-the-art [22.836654317217324]
With the increased level of photorealism, synthetic media are becoming hardly distinguishable from real ones.
It is important to develop automated tools to reliably and timely detect synthetic media.
arXiv Detail & Related papers (2021-04-06T15:54:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.