What Does an Audio Deepfake Detector Focus on? A Study in the Time Domain
- URL: http://arxiv.org/abs/2501.13887v2
- Date: Mon, 27 Jan 2025 17:17:08 GMT
- Title: What Does an Audio Deepfake Detector Focus on? A Study in the Time Domain
- Authors: Petr Grinberg, Ankur Kumar, Surya Koppisetti, Gaurav Bharaj,
- Abstract summary: We propose a relevancy-based explainable AI (XAI) method to analyze the predictions of transformer-based ADD models.<n>We consider large datasets, unlike previous works where only limited utterances are studied.<n>Further investigation on the relative importance of speech/non-speech, phonetic content, and voice onsets/offsets suggest that the XAI results obtained from analyzing limited utterances don't necessarily hold when evaluated on large datasets.
- Score: 4.8975242634878295
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adding explanations to audio deepfake detection (ADD) models will boost their real-world application by providing insight on the decision making process. In this paper, we propose a relevancy-based explainable AI (XAI) method to analyze the predictions of transformer-based ADD models. We compare against standard Grad-CAM and SHAP-based methods, using quantitative faithfulness metrics as well as a partial spoof test, to comprehensively analyze the relative importance of different temporal regions in an audio. We consider large datasets, unlike previous works where only limited utterances are studied, and find that the XAI methods differ in their explanations. The proposed relevancy-based XAI method performs the best overall on a variety of metrics. Further investigation on the relative importance of speech/non-speech, phonetic content, and voice onsets/offsets suggest that the XAI results obtained from analyzing limited utterances don't necessarily hold when evaluated on large datasets.
Related papers
- A Framework for Evaluating Faithfulness in Explainable AI for Machine Anomalous Sound Detection Using Frequency-Band Perturbation [37.2521660642532]
We introduce a new quantitative framework for evaluating XAI faithfulness in machine-sound analysis.<n>We show that XAI techniques differ in reliability, with Occlusion demonstrating the strongest alignment with true model sensitivity.
arXiv Detail & Related papers (2026-01-26T23:06:50Z) - AUDDT: Audio Unified Deepfake Detection Benchmark Toolkit [7.279026980203529]
We systematically review 28 existing audio deepfake datasets and present an open-source benchmarking toolkit called AUDDT.<n>The goal of this toolkit is to automate the evaluation of pretrained detectors across these 28 datasets, giving users direct feedback on the advantages and shortcomings of their deepfake detectors.
arXiv Detail & Related papers (2025-09-25T21:09:40Z) - Explainability of CNN Based Classification Models for Acoustic Signal [0.0]
We investigate the vocalizations of a bird species with strong geographic variation throughout its range in North America.<n>To interpret the model's predictions, we applied both model-agnostic (LIME, SHAP) and model-specific (DeepLIFT, Grad-CAM) XAI techniques.<n>These techniques produced different but complementary explanations, and when their explanations were considered together, they provided more complete and interpretable insights into the model's decision-making.
arXiv Detail & Related papers (2025-09-10T16:11:01Z) - AHELM: A Holistic Evaluation of Audio-Language Models [78.20477815156484]
multimodal audio-language models (ALMs) take interleaved audio and text as input and output text.<n>AHELM is a benchmark that aggregates various datasets -- including 2 new synthetic audio-text datasets called PARADE and CoRe-Bench.<n>We also standardize the prompts, inference parameters, and evaluation metrics to ensure equitable comparisons across models.
arXiv Detail & Related papers (2025-08-29T07:40:39Z) - ODExAI: A Comprehensive Object Detection Explainable AI Evaluation [1.338174941551702]
We introduce the Object Detection Explainable AI Evaluation (ODExAI) to assess XAI methods in object detection.
We benchmark a set of XAI methods across two widely used object detectors and standard datasets.
arXiv Detail & Related papers (2025-04-27T14:16:14Z) - SONAR: A Synthetic AI-Audio Detection Framework and Benchmark [59.09338266364506]
SONAR is a synthetic AI-Audio Detection Framework and Benchmark.
It aims to provide a comprehensive evaluation for distinguishing cutting-edge AI-synthesized auditory content.
It is the first framework to uniformly benchmark AI-audio detection across both traditional and foundation model-based deepfake detection systems.
arXiv Detail & Related papers (2024-10-06T01:03:42Z) - CLAIR-A: Leveraging Large Language Models to Judge Audio Captions [73.51087998971418]
evaluating machine-generated audio captions is a complex task that requires considering diverse factors.
We propose CLAIR-A, a simple and flexible method that leverages the zero-shot capabilities of large language models.
In our evaluations, CLAIR-A better predicts human judgements of quality compared to traditional metrics.
arXiv Detail & Related papers (2024-09-19T17:59:52Z) - Benchmarking Representations for Speech, Music, and Acoustic Events [24.92641211471113]
ARCH is a comprehensive benchmark for evaluating ARL methods on diverse audio classification domains.
ARCH comprises 12 datasets, that allow us to thoroughly assess pre-trained SSL models of different sizes.
To address the current lack of open-source, pre-trained models for non-speech audio, we also release new pre-trained models that demonstrate strong performance on non-speech datasets.
arXiv Detail & Related papers (2024-05-02T01:24:53Z) - SIDU-TXT: An XAI Algorithm for NLP with a Holistic Assessment Approach [14.928572140620245]
'Similarity Difference and Uniqueness' (SIDU) XAI method, recognized for its superior capability in localizing entire salient regions in image-based classification is extended to textual data.
The extended method, SIDU-TXT, utilizes feature activation maps from 'black-box' models to generate heatmaps at a granular, word-based level.
We find that, in sentiment analysis task of a movie review dataset, SIDU-TXT excels in both functionally and human-grounded evaluations.
arXiv Detail & Related papers (2024-02-05T14:29:54Z) - Can We Trust Explainable AI Methods on ASR? An Evaluation on Phoneme
Recognition [9.810810252231812]
Interest in using XAI techniques to explain deep learning-based automatic speech recognition (ASR) is emerging.
We adapt a state-of-the-art XAI technique from the image classification domain, Local Interpretable Model-Agnostic Explanations (LIME) to a model trained for a TIMIT-based phoneme recognition task.
We find a variant of LIME based on time partitioned audio segments, that we propose in this paper, produces the most reliable explanations.
arXiv Detail & Related papers (2023-05-29T11:04:13Z) - An Experimental Investigation into the Evaluation of Explainability
Methods [60.54170260771932]
This work compares 14 different metrics when applied to nine state-of-the-art XAI methods and three dummy methods (e.g., random saliency maps) used as references.
Experimental results show which of these metrics produces highly correlated results, indicating potential redundancy.
arXiv Detail & Related papers (2023-05-25T08:07:07Z) - Analysing the Impact of Audio Quality on the Use of Naturalistic
Long-Form Recordings for Infant-Directed Speech Research [62.997667081978825]
Modelling of early language acquisition aims to understand how infants bootstrap their language skills.
Recent developments have enabled the use of more naturalistic training data for computational models.
It is currently unclear how the sound quality could affect analyses and modelling experiments conducted on such data.
arXiv Detail & Related papers (2023-05-03T08:25:37Z) - Deep Learning for Hate Speech Detection: A Comparative Study [54.42226495344908]
We present here a large-scale empirical comparison of deep and shallow hate-speech detection methods.
Our goal is to illuminate progress in the area, and identify strengths and weaknesses in the current state-of-the-art.
In doing so we aim to provide guidance as to the use of hate-speech detection in practice, quantify the state-of-the-art, and identify future research directions.
arXiv Detail & Related papers (2022-02-19T03:48:20Z) - Interpretable Multi-dataset Evaluation for Named Entity Recognition [110.64368106131062]
We present a general methodology for interpretable evaluation for the named entity recognition (NER) task.
The proposed evaluation method enables us to interpret the differences in models and datasets, as well as the interplay between them.
By making our analysis tool available, we make it easy for future researchers to run similar analyses and drive progress in this area.
arXiv Detail & Related papers (2020-11-13T10:53:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.