Related papers: The Unheard Alternative: Contrastive Explanations for Speech-to-Text Models

The Unheard Alternative: Contrastive Explanations for Speech-to-Text Models

URL: http://arxiv.org/abs/2509.26543v1
Date: Tue, 30 Sep 2025 17:17:27 GMT
Title: The Unheard Alternative: Contrastive Explanations for Speech-to-Text Models
Authors: Lina Conti, Dennis Fucci, Marco Gaido, Matteo Negri, Guillaume Wisniewski, Luisa Bentivogli,
Abstract summary: Contrastive explanations indicate why an AI system produced one output (the target) instead of another (the foil)<n>We propose the first method to obtain contrastive explanations in S2T by analyzing how parts of the input spectrogram influence the choice between alternative outputs.<n>Our work provides a foundation for better understanding S2T models.
Score: 25.126933196101703
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Contrastive explanations, which indicate why an AI system produced one output (the target) instead of another (the foil), are widely regarded in explainable AI as more informative and interpretable than standard explanations. However, obtaining such explanations for speech-to-text (S2T) generative models remains an open challenge. Drawing from feature attribution techniques, we propose the first method to obtain contrastive explanations in S2T by analyzing how parts of the input spectrogram influence the choice between alternative outputs. Through a case study on gender assignment in speech translation, we show that our method accurately identifies the audio features that drive the selection of one gender over another. By extending the scope of contrastive explanations to S2T, our work provides a foundation for better understanding S2T models.

Related papers

Towards More Faithful Natural Language Explanation Using Multi-Level Contrastive Learning in VQA [7.141288053123662]
Natural language explanation in visual question answer (VQA-NLE) aims to explain the decision-making process of models by generating natural language sentences to increase users' trust in the black-box systems. Existing post-hoc explanations are not always aligned with human logical inference, suffering from the issues on: 1) Deductive unsatisfiability, the generated explanations do not logically lead to the answer; 2) Factual inconsistency, the model falsifies its counterfactual explanation for answers without considering the facts in images; and 3) Semantic perturbation insensitivity, the model can not recognize the semantic changes caused by small perturbations
arXiv Detail & Related papers (2023-12-21T05:51:55Z)
Explaining Speech Classification Models via Word-Level Audio Segments and Paralinguistic Features [35.31998003091635]
We introduce a new approach to explain speech classification models. We generate easy-to-interpret explanations via input perturbation on two information levels. We validate our approach by explaining two state-of-the-art SLU models on two speech classification tasks in English and Italian.
arXiv Detail & Related papers (2023-09-14T14:12:34Z)
Explanation Selection Using Unlabeled Data for Chain-of-Thought Prompting [80.9896041501715]
Explanations that have not been "tuned" for a task, such as off-the-shelf explanations written by nonexperts, may lead to mediocre performance. This paper tackles the problem of how to optimize explanation-infused prompts in a blackbox fashion.
arXiv Detail & Related papers (2023-02-09T18:02:34Z)
Selective Explanations: Leveraging Human Input to Align Explainable AI [40.33998268146951]
We propose a general framework for generating selective explanations by leveraging human input on a small sample. As a showcase, we use a decision-support task to explore selective explanations based on what the decision-maker would consider relevant to the decision task. Our experiments demonstrate the promise of selective explanations in reducing over-reliance on AI.
arXiv Detail & Related papers (2023-01-23T19:00:02Z)
INTERACTION: A Generative XAI Framework for Natural Language Inference Explanations [58.062003028768636]
Current XAI approaches only focus on delivering a single explanation. This paper proposes a generative XAI framework, INTERACTION (explaIn aNd predicT thEn queRy with contextuAl CondiTional varIational autO-eNcoder) Our novel framework presents explanation in two steps: (step one) Explanation and Label Prediction; and (step two) Diverse Evidence Generation.
arXiv Detail & Related papers (2022-09-02T13:52:39Z)
TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation [61.564874831498145]
TranSpeech is a speech-to-speech translation model with bilateral perturbation. We establish a non-autoregressive S2ST technique, which repeatedly masks and predicts unit choices. TranSpeech shows a significant improvement in inference latency, enabling speedup up to 21.4x than autoregressive technique.
arXiv Detail & Related papers (2022-05-25T06:34:14Z)
Human Interpretation of Saliency-based Explanation Over Text [65.29015910991261]
We study saliency-based explanations over textual data. We find that people often mis-interpret the explanations. We propose a method to adjust saliencies based on model estimates of over- and under-perception.
arXiv Detail & Related papers (2022-01-27T15:20:32Z)
Contrastive Explanations for Model Interpretability [77.92370750072831]
We propose a methodology to produce contrastive explanations for classification models. Our method is based on projecting model representation to a latent space. Our findings shed light on the ability of label-contrastive explanations to provide a more accurate and finer-grained interpretability of a model's decision.
arXiv Detail & Related papers (2021-03-02T00:36:45Z)
SPLAT: Speech-Language Joint Pre-Training for Spoken Language Understanding [61.02342238771685]
Spoken language understanding requires a model to analyze input acoustic signal to understand its linguistic content and make predictions. Various pre-training methods have been proposed to learn rich representations from large-scale unannotated speech and text. We propose a novel semi-supervised learning framework, SPLAT, to jointly pre-train the speech and language modules.
arXiv Detail & Related papers (2020-10-05T19:29:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.