Related papers: What Does an Audio Deepfake Detector Focus on? A Study in the Time Domain

What Does an Audio Deepfake Detector Focus on? A Study in the Time Domain

URL: http://arxiv.org/abs/2501.13887v2
Date: Mon, 27 Jan 2025 17:17:08 GMT
Title: What Does an Audio Deepfake Detector Focus on? A Study in the Time Domain
Authors: Petr Grinberg, Ankur Kumar, Surya Koppisetti, Gaurav Bharaj,
Abstract summary: We propose a relevancy-based explainable AI (XAI) method to analyze the predictions of transformer-based ADD models.<n>We consider large datasets, unlike previous works where only limited utterances are studied.<n>Further investigation on the relative importance of speech/non-speech, phonetic content, and voice onsets/offsets suggest that the XAI results obtained from analyzing limited utterances don't necessarily hold when evaluated on large datasets.
Score: 4.8975242634878295
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Adding explanations to audio deepfake detection (ADD) models will boost their real-world application by providing insight on the decision making process. In this paper, we propose a relevancy-based explainable AI (XAI) method to analyze the predictions of transformer-based ADD models. We compare against standard Grad-CAM and SHAP-based methods, using quantitative faithfulness metrics as well as a partial spoof test, to comprehensively analyze the relative importance of different temporal regions in an audio. We consider large datasets, unlike previous works where only limited utterances are studied, and find that the XAI methods differ in their explanations. The proposed relevancy-based XAI method performs the best overall on a variety of metrics. Further investigation on the relative importance of speech/non-speech, phonetic content, and voice onsets/offsets suggest that the XAI results obtained from analyzing limited utterances don't necessarily hold when evaluated on large datasets.

Related papers

ODExAI: A Comprehensive Object Detection Explainable AI Evaluation [1.338174941551702]
We introduce the Object Detection Explainable AI Evaluation (ODExAI) to assess XAI methods in object detection. We benchmark a set of XAI methods across two widely used object detectors and standard datasets.
arXiv Detail & Related papers (2025-04-27T14:16:14Z)
SONAR: A Synthetic AI-Audio Detection Framework and Benchmark [59.09338266364506]
SONAR is a synthetic AI-Audio Detection Framework and Benchmark. It aims to provide a comprehensive evaluation for distinguishing cutting-edge AI-synthesized auditory content. It is the first framework to uniformly benchmark AI-audio detection across both traditional and foundation model-based deepfake detection systems.
arXiv Detail & Related papers (2024-10-06T01:03:42Z)
CLAIR-A: Leveraging Large Language Models to Judge Audio Captions [73.51087998971418]
evaluating machine-generated audio captions is a complex task that requires considering diverse factors. We propose CLAIR-A, a simple and flexible method that leverages the zero-shot capabilities of large language models. In our evaluations, CLAIR-A better predicts human judgements of quality compared to traditional metrics.
arXiv Detail & Related papers (2024-09-19T17:59:52Z)
Benchmarking Representations for Speech, Music, and Acoustic Events [24.92641211471113]
ARCH is a comprehensive benchmark for evaluating ARL methods on diverse audio classification domains. ARCH comprises 12 datasets, that allow us to thoroughly assess pre-trained SSL models of different sizes. To address the current lack of open-source, pre-trained models for non-speech audio, we also release new pre-trained models that demonstrate strong performance on non-speech datasets.
arXiv Detail & Related papers (2024-05-02T01:24:53Z)
SIDU-TXT: An XAI Algorithm for NLP with a Holistic Assessment Approach [14.928572140620245]
'Similarity Difference and Uniqueness' (SIDU) XAI method, recognized for its superior capability in localizing entire salient regions in image-based classification is extended to textual data. The extended method, SIDU-TXT, utilizes feature activation maps from 'black-box' models to generate heatmaps at a granular, word-based level. We find that, in sentiment analysis task of a movie review dataset, SIDU-TXT excels in both functionally and human-grounded evaluations.
arXiv Detail & Related papers (2024-02-05T14:29:54Z)
Can We Trust Explainable AI Methods on ASR? An Evaluation on Phoneme Recognition [9.810810252231812]
Interest in using XAI techniques to explain deep learning-based automatic speech recognition (ASR) is emerging. We adapt a state-of-the-art XAI technique from the image classification domain, Local Interpretable Model-Agnostic Explanations (LIME) to a model trained for a TIMIT-based phoneme recognition task. We find a variant of LIME based on time partitioned audio segments, that we propose in this paper, produces the most reliable explanations.
arXiv Detail & Related papers (2023-05-29T11:04:13Z)
An Experimental Investigation into the Evaluation of Explainability Methods [60.54170260771932]
This work compares 14 different metrics when applied to nine state-of-the-art XAI methods and three dummy methods (e.g., random saliency maps) used as references. Experimental results show which of these metrics produces highly correlated results, indicating potential redundancy.
arXiv Detail & Related papers (2023-05-25T08:07:07Z)
Analysing the Impact of Audio Quality on the Use of Naturalistic Long-Form Recordings for Infant-Directed Speech Research [62.997667081978825]
Modelling of early language acquisition aims to understand how infants bootstrap their language skills. Recent developments have enabled the use of more naturalistic training data for computational models. It is currently unclear how the sound quality could affect analyses and modelling experiments conducted on such data.
arXiv Detail & Related papers (2023-05-03T08:25:37Z)
Deep Learning for Hate Speech Detection: A Comparative Study [54.42226495344908]
We present here a large-scale empirical comparison of deep and shallow hate-speech detection methods. Our goal is to illuminate progress in the area, and identify strengths and weaknesses in the current state-of-the-art. In doing so we aim to provide guidance as to the use of hate-speech detection in practice, quantify the state-of-the-art, and identify future research directions.
arXiv Detail & Related papers (2022-02-19T03:48:20Z)
Interpretable Multi-dataset Evaluation for Named Entity Recognition [110.64368106131062]
We present a general methodology for interpretable evaluation for the named entity recognition (NER) task. The proposed evaluation method enables us to interpret the differences in models and datasets, as well as the interplay between them. By making our analysis tool available, we make it easy for future researchers to run similar analyses and drive progress in this area.
arXiv Detail & Related papers (2020-11-13T10:53:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.