Can We Trust Explainable AI Methods on ASR? An Evaluation on Phoneme
Recognition
- URL: http://arxiv.org/abs/2305.18011v1
- Date: Mon, 29 May 2023 11:04:13 GMT
- Title: Can We Trust Explainable AI Methods on ASR? An Evaluation on Phoneme
Recognition
- Authors: Xiaoliang Wu, Peter Bell, Ajitha Rajan
- Abstract summary: Interest in using XAI techniques to explain deep learning-based automatic speech recognition (ASR) is emerging.
We adapt a state-of-the-art XAI technique from the image classification domain, Local Interpretable Model-Agnostic Explanations (LIME) to a model trained for a TIMIT-based phoneme recognition task.
We find a variant of LIME based on time partitioned audio segments, that we propose in this paper, produces the most reliable explanations.
- Score: 9.810810252231812
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Explainable AI (XAI) techniques have been widely used to help explain and
understand the output of deep learning models in fields such as image
classification and Natural Language Processing. Interest in using XAI
techniques to explain deep learning-based automatic speech recognition (ASR) is
emerging. but there is not enough evidence on whether these explanations can be
trusted. To address this, we adapt a state-of-the-art XAI technique from the
image classification domain, Local Interpretable Model-Agnostic Explanations
(LIME), to a model trained for a TIMIT-based phoneme recognition task. This
simple task provides a controlled setting for evaluation while also providing
expert annotated ground truth to assess the quality of explanations. We find a
variant of LIME based on time partitioned audio segments, that we propose in
this paper, produces the most reliable explanations, containing the ground
truth 96% of the time in its top three audio segments.
Related papers
- Probing the Information Encoded in Neural-based Acoustic Models of
Automatic Speech Recognition Systems [7.207019635697126]
This article aims to determine which and where information is located in an automatic speech recognition acoustic model (AM)
Experiments are performed on speaker verification, acoustic environment classification, gender classification, tempo-distortion detection systems and speech sentiment/emotion identification.
Analysis showed that neural-based AMs hold heterogeneous information that seems surprisingly uncorrelated with phoneme recognition.
arXiv Detail & Related papers (2024-02-29T18:43:53Z) - A Glitch in the Matrix? Locating and Detecting Language Model Grounding with Fakepedia [57.31074448586854]
Large language models (LLMs) have an impressive ability to draw on novel information supplied in their context.
Yet the mechanisms underlying this contextual grounding remain unknown.
We present a novel method to study grounding abilities using Fakepedia.
arXiv Detail & Related papers (2023-12-04T17:35:42Z) - Scene Text Recognition Models Explainability Using Local Features [11.990881697492078]
Scene Text Recognition (STR) Explainability is the study on how humans can understand the cause of a model's prediction.
Recent XAI literatures on STR only provide a simple analysis and do not fully explore other XAI methods.
We specifically work on data explainability frameworks, called attribution-based methods, that explain the important parts of an input data in deep learning models.
We propose a new method, STRExp, to take into consideration the local explanations, i.e. the individual character prediction explanations.
arXiv Detail & Related papers (2023-10-14T10:01:52Z) - SememeASR: Boosting Performance of End-to-End Speech Recognition against
Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge [58.979490858061745]
We introduce sememe-based semantic knowledge information to speech recognition.
Our experiments show that sememe information can improve the effectiveness of speech recognition.
In addition, our further experiments show that sememe knowledge can improve the model's recognition of long-tailed data.
arXiv Detail & Related papers (2023-09-04T08:35:05Z) - Explanations for Automatic Speech Recognition [9.810810252231812]
We provide an explanation for an ASR transcription as a subset of audio frames.
We adapt existing explainable AI techniques from image classification-Statistical Fault Localisation(SFL) and Causal.
We evaluate the quality of the explanations generated by the proposed techniques over three different ASR,Google API, the baseline model of Sphinx, Deepspeech and 100 audio samples from the Commonvoice dataset.
arXiv Detail & Related papers (2023-02-27T11:09:19Z) - Self-Supervised Speech Representation Learning: A Review [105.1545308184483]
Self-supervised representation learning methods promise a single universal model that would benefit a wide variety of tasks and domains.
Speech representation learning is experiencing similar progress in three main categories: generative, contrastive, and predictive methods.
This review presents approaches for self-supervised speech representation learning and their connection to other research areas.
arXiv Detail & Related papers (2022-05-21T16:52:57Z) - Visualizing Automatic Speech Recognition -- Means for a Better
Understanding? [0.1868368163807795]
We show how attribution methods, that we import from image recognition and suitably adapt to handle audio data, can help to clarify the working of ASR.
Taking Speech Deep, an end-to-end model for ASR, as a case study, we show how these techniques help to visualize which features of the input are the most influential in determining the output.
arXiv Detail & Related papers (2022-02-01T13:35:08Z) - Speaker-Conditioned Hierarchical Modeling for Automated Speech Scoring [60.55025339250815]
We propose a novel deep learning technique for non-native ASS, called speaker-conditioned hierarchical modeling.
We take advantage of the fact that oral proficiency tests rate multiple responses for a candidate. In our technique, we take advantage of the fact that oral proficiency tests rate multiple responses for a candidate. We extract context from these responses and feed them as additional speaker-specific context to our network to score a particular response.
arXiv Detail & Related papers (2021-08-30T07:00:28Z) - Explainability in Deep Reinforcement Learning [68.8204255655161]
We review recent works in the direction to attain Explainable Reinforcement Learning (XRL)
In critical situations where it is essential to justify and explain the agent's behaviour, better explainability and interpretability of RL models could help gain scientific insight on the inner workings of what is still considered a black box.
arXiv Detail & Related papers (2020-08-15T10:11:42Z) - Explainable Active Learning (XAL): An Empirical Study of How Local
Explanations Impact Annotator Experience [76.9910678786031]
We propose a novel paradigm of explainable active learning (XAL), by introducing techniques from the recently surging field of explainable AI (XAI) into an Active Learning setting.
Our study shows benefits of AI explanation as interfaces for machine teaching--supporting trust calibration and enabling rich forms of teaching feedback, and potential drawbacks--anchoring effect with the model judgment and cognitive workload.
arXiv Detail & Related papers (2020-01-24T22:52:18Z) - AudioMNIST: Exploring Explainable Artificial Intelligence for Audio
Analysis on a Simple Benchmark [12.034688724153044]
This paper explores post-hoc explanations for deep neural networks in the audio domain.
We present a novel Open Source audio dataset consisting of 30,000 audio samples of English spoken digits.
We demonstrate the superior interpretability of audible explanations over visual ones in a human user study.
arXiv Detail & Related papers (2018-07-09T23:11:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.