Related papers: Explainability of CNN Based Classification Models for Acoustic Signal

Explainability of CNN Based Classification Models for Acoustic Signal

URL: http://arxiv.org/abs/2509.08717v1
Date: Wed, 10 Sep 2025 16:11:01 GMT
Title: Explainability of CNN Based Classification Models for Acoustic Signal
Authors: Zubair Faruqui, Mackenzie S. McIntire, Rahul Dubey, Jay McEntee,
Abstract summary: We investigate the vocalizations of a bird species with strong geographic variation throughout its range in North America.<n>To interpret the model's predictions, we applied both model-agnostic (LIME, SHAP) and model-specific (DeepLIFT, Grad-CAM) XAI techniques.<n>These techniques produced different but complementary explanations, and when their explanations were considered together, they provided more complete and interpretable insights into the model's decision-making.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Explainable Artificial Intelligence (XAI) has emerged as a critical tool for interpreting the predictions of complex deep learning models. While XAI has been increasingly applied in various domains within acoustics, its use in bioacoustics, which involves analyzing audio signals from living organisms, remains relatively underexplored. In this paper, we investigate the vocalizations of a bird species with strong geographic variation throughout its range in North America. Audio recordings were converted into spectrogram images and used to train a deep Convolutional Neural Network (CNN) for classification, achieving an accuracy of 94.8\%. To interpret the model's predictions, we applied both model-agnostic (LIME, SHAP) and model-specific (DeepLIFT, Grad-CAM) XAI techniques. These techniques produced different but complementary explanations, and when their explanations were considered together, they provided more complete and interpretable insights into the model's decision-making. This work highlights the importance of using a combination of XAI techniques to improve trust and interoperability, not only in broader acoustics signal analysis but also argues for broader applicability in different domain specific tasks.

Related papers

A Framework for Evaluating Faithfulness in Explainable AI for Machine Anomalous Sound Detection Using Frequency-Band Perturbation [37.2521660642532]
We introduce a new quantitative framework for evaluating XAI faithfulness in machine-sound analysis.<n>We show that XAI techniques differ in reliability, with Occlusion demonstrating the strongest alignment with true model sensitivity.
arXiv Detail & Related papers (2026-01-26T23:06:50Z)
Learning Robust Spatial Representations from Binaural Audio through Feature Distillation [64.36563387033921]
We investigate the use of a pretraining stage based on feature distillation to learn a robust spatial representation of speech without the need for data labels.<n>Our experiments demonstrate that the pretrained models show improved performance in noisy and reverberant environments.
arXiv Detail & Related papers (2025-08-28T15:43:15Z)
ForenX: Towards Explainable AI-Generated Image Detection with Multimodal Large Language Models [82.04858317800097]
We present ForenX, a novel method that not only identifies the authenticity of images but also provides explanations that resonate with human thoughts.<n>ForenX employs the powerful multimodal large language models (MLLMs) to analyze and interpret forensic cues.<n>We introduce ForgReason, a dataset dedicated to descriptions of forgery evidences in AI-generated images.
arXiv Detail & Related papers (2025-08-02T15:21:26Z)
What Does an Audio Deepfake Detector Focus on? A Study in the Time Domain [4.8975242634878295]
We propose a relevancy-based explainable AI (XAI) method to analyze the predictions of transformer-based ADD models.<n>We consider large datasets, unlike previous works where only limited utterances are studied.<n>Further investigation on the relative importance of speech/non-speech, phonetic content, and voice onsets/offsets suggest that the XAI results obtained from analyzing limited utterances don't necessarily hold when evaluated on large datasets.
arXiv Detail & Related papers (2025-01-23T18:00:14Z)
Where are we in audio deepfake detection? A systematic analysis over generative and detection models [59.09338266364506]
SONAR is a synthetic AI-Audio Detection Framework and Benchmark.<n>It provides a comprehensive evaluation for distinguishing cutting-edge AI-synthesized auditory content.<n>It is the first framework to uniformly benchmark AI-audio detection across both traditional and foundation model-based detection systems.
arXiv Detail & Related papers (2024-10-06T01:03:42Z)
Underwater SONAR Image Classification and Analysis using LIME-based Explainable Artificial Intelligence [0.0]
This paper explores the application of the eXplainable Artificial Intelligence (XAI) tool to interpret the underwater image classification results. An extensive analysis of transfer learning techniques for image classification using benchmark Convolutional Neural Network (CNN) architectures is carried out. XAI techniques highlight interpretability of the results in a more human-compliant way, thus boosting our confidence and reliability.
arXiv Detail & Related papers (2024-08-23T04:54:18Z)
On the Utility of Speech and Audio Foundation Models for Marmoset Call Analysis [19.205671029694074]
This study assesses feature representations derived from speech and general audio domains, across pre-training bandwidths of 4, 8, and 16 kHz for marmoset call-type and caller classification tasks. Results show that models with higher bandwidth improve performance, and pre-training on speech or general audio yields comparable results, improving over a spectral baseline.
arXiv Detail & Related papers (2024-07-23T12:00:44Z)
Explaining Text Similarity in Transformer Models [52.571158418102584]
Recent advances in explainable AI have made it possible to mitigate limitations by leveraging improved explanations for Transformers. We use BiLRP, an extension developed for computing second-order explanations in bilinear similarity models, to investigate which feature interactions drive similarity in NLP models. Our findings contribute to a deeper understanding of different semantic similarity tasks and models, highlighting how novel explainable AI methods enable in-depth analyses and corpus-level insights.
arXiv Detail & Related papers (2024-05-10T17:11:31Z)
AudioProtoPNet: An interpretable deep learning model for bird sound classification [1.49199020343864]
This study introduces AudioProtoPNet, an adaptation of the Prototypical Part Network (ProtoPNet) for multi-label bird sound classification. It is an inherently interpretable model that uses a ConvNeXt backbone to extract embeddings. The model was trained on the BirdSet training dataset, which consists of 9,734 bird species and over 6,800 hours of recordings.
arXiv Detail & Related papers (2024-04-16T09:37:41Z)
Self-supervised models of audio effectively explain human cortical responses to speech [71.57870452667369]
We capitalize on the progress of self-supervised speech representation learning to create new state-of-the-art models of the human auditory system. We show that these results show that self-supervised models effectively capture the hierarchy of information relevant to different stages of speech processing in human cortex.
arXiv Detail & Related papers (2022-05-27T22:04:02Z)
Polynomial Networks in Deep Classifiers [55.90321402256631]
We cast the study of deep neural networks under a unifying framework. Our framework provides insights on the inductive biases of each model. The efficacy of the proposed models is evaluated on standard image and audio classification benchmarks.
arXiv Detail & Related papers (2021-04-16T06:41:20Z)
AudioMNIST: Exploring Explainable Artificial Intelligence for Audio Analysis on a Simple Benchmark [12.034688724153044]
This paper explores post-hoc explanations for deep neural networks in the audio domain. We present a novel Open Source audio dataset consisting of 30,000 audio samples of English spoken digits. We demonstrate the superior interpretability of audible explanations over visual ones in a human user study.
arXiv Detail & Related papers (2018-07-09T23:11:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.