Related papers: Are Inherently Interpretable Models More Robust? A Study In Music Emotion Recognition

Are Inherently Interpretable Models More Robust? A Study In Music Emotion Recognition

URL: http://arxiv.org/abs/2508.03780v1
Date: Tue, 05 Aug 2025 13:29:29 GMT
Title: Are Inherently Interpretable Models More Robust? A Study In Music Emotion Recognition
Authors: Katharina Hoedt, Arthur Flexer, Gerhard Widmer,
Abstract summary: We investigate whether inherently interpretable deep models are more robust to irrelevant perturbations in the data, compared to their black-box counterparts.<n>Our results indicate that inherently more interpretable models can indeed be more robust than their black-box counterparts, and achieve similar levels of robustness as adversarially trained models.
Score: 3.5775697416994485
License: http://creativecommons.org/licenses/by/4.0/
Abstract: One of the desired key properties of deep learning models is the ability to generalise to unseen samples. When provided with new samples that are (perceptually) similar to one or more training samples, deep learning models are expected to produce correspondingly similar outputs. Models that succeed in predicting similar outputs for similar inputs are often called robust. Deep learning models, on the other hand, have been shown to be highly vulnerable to minor (adversarial) perturbations of the input, which manage to drastically change a model's output and simultaneously expose its reliance on spurious correlations. In this work, we investigate whether inherently interpretable deep models, i.e., deep models that were designed to focus more on meaningful and interpretable features, are more robust to irrelevant perturbations in the data, compared to their black-box counterparts. We test our hypothesis by comparing the robustness of an interpretable and a black-box music emotion recognition (MER) model when challenged with adversarial examples. Furthermore, we include an adversarially trained model, which is optimised to be more robust, in the comparison. Our results indicate that inherently more interpretable models can indeed be more robust than their black-box counterparts, and achieve similar levels of robustness as adversarially trained models, at lower computational cost.

Related papers

Multi-Level Collaboration in Model Merging [56.31088116526825]
This paper explores the intrinsic connections between model merging and model ensembling.<n>We find that even when previous restrictions are not met, there is still a way for model merging to attain a near-identical and superior performance similar to that of ensembling.
arXiv Detail & Related papers (2025-03-03T07:45:04Z)
A Robust Adversarial Ensemble with Causal (Feature Interaction) Interpretations for Image Classification [9.945272787814941]
We present a deep ensemble model that combines discriminative features with generative models to achieve both high accuracy and adversarial robustness.<n>Our approach integrates a bottom-level pre-trained discriminative network for feature extraction with a top-level generative classification network that models adversarial input distributions.
arXiv Detail & Related papers (2024-12-28T05:06:20Z)
Scaling Trends in Language Model Robustness [7.725206196110384]
We study language model robustness across several classification tasks, model families, and adversarial attacks.<n>We find that in the absence of explicit safety training, larger models are not consistently more robust.<n>We find that while attack scaling outpaces adversarial training across all models studied, larger adversarially trained models might give defense the advantage in the long run.
arXiv Detail & Related papers (2024-07-25T17:26:41Z)
Exploring new ways: Enforcing representational dissimilarity to learn new features and reduce error consistency [1.7497479054352052]
We show that highly dissimilar intermediate representations result in less correlated output predictions and slightly lower error consistency. With this, we shine first light on the connection between intermediate representations and their impact on the output predictions.
arXiv Detail & Related papers (2023-07-05T14:28:46Z)
Robust Models are less Over-Confident [10.42820615166362]
adversarial training (AT) aims to achieve robustness against such attacks. We empirically analyze a variety of adversarially trained models that achieve high robust accuracies. AT has an interesting side-effect: it leads to models that are significantly less overconfident with their decisions.
arXiv Detail & Related papers (2022-10-12T06:14:55Z)
Explain, Edit, and Understand: Rethinking User Study Design for Evaluating Model Explanations [97.91630330328815]
We conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews. We observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.
arXiv Detail & Related papers (2021-12-17T18:29:56Z)
Self-Damaging Contrastive Learning [92.34124578823977]
Unlabeled data in reality is commonly imbalanced and shows a long-tail distribution. This paper proposes a principled framework called Self-Damaging Contrastive Learning to automatically balance the representation learning without knowing the classes. Our experiments show that SDCLR significantly improves not only overall accuracies but also balancedness.
arXiv Detail & Related papers (2021-06-06T00:04:49Z)
On the Efficacy of Adversarial Data Collection for Question Answering: Results from a Large-Scale Randomized Study [65.17429512679695]
In adversarial data collection (ADC), a human workforce interacts with a model in real time, attempting to produce examples that elicit incorrect predictions. Despite ADC's intuitive appeal, it remains unclear when training on adversarial datasets produces more robust models.
arXiv Detail & Related papers (2021-06-02T00:48:33Z)
Understanding Robustness in Teacher-Student Setting: A New Perspective [42.746182547068265]
Adrial examples are machine learning models where bounded adversarial perturbation could mislead the models to make arbitrarily incorrect predictions. Extensive studies try to explain the existence of adversarial examples and provide ways to improve model robustness. Our studies could shed light on the future exploration about adversarial examples, and enhancing model robustness via principled data augmentation.
arXiv Detail & Related papers (2021-02-25T20:54:24Z)
Quantifying and Mitigating Privacy Risks of Contrastive Learning [4.909548818641602]
We perform the first privacy analysis of contrastive learning through the lens of membership inference and attribute inference. Our results show that contrastive models are less vulnerable to membership inference attacks but more vulnerable to attribute inference attacks compared to supervised models. To remedy this situation, we propose the first privacy-preserving contrastive learning mechanism, namely Talos.
arXiv Detail & Related papers (2021-02-08T11:38:11Z)
Firearm Detection via Convolutional Neural Networks: Comparing a Semantic Segmentation Model Against End-to-End Solutions [68.8204255655161]
Threat detection of weapons and aggressive behavior from live video can be used for rapid detection and prevention of potentially deadly incidents. One way for achieving this is through the use of artificial intelligence and, in particular, machine learning for image analysis. We compare a traditional monolithic end-to-end deep learning model and a previously proposed model based on an ensemble of simpler neural networks detecting fire-weapons via semantic segmentation.
arXiv Detail & Related papers (2020-12-17T15:19:29Z)
Understanding Neural Abstractive Summarization Models via Uncertainty [54.37665950633147]
seq2seq abstractive summarization models generate text in a free-form manner. We study the entropy, or uncertainty, of the model's token-level predictions. We show that uncertainty is a useful perspective for analyzing summarization and text generation models more broadly.
arXiv Detail & Related papers (2020-10-15T16:57:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.