Related papers: More Than Words: Towards Better Quality Interpretations of Text Classifiers

More Than Words: Towards Better Quality Interpretations of Text Classifiers

URL: http://arxiv.org/abs/2112.12444v1
Date: Thu, 23 Dec 2021 10:18:50 GMT
Title: More Than Words: Towards Better Quality Interpretations of Text Classifiers
Authors: Muhammad Bilal Zafar, Philipp Schmidt, Michele Donini, C\'edric Archambeau, Felix Biessmann, Sanjiv Ranjan Das, Krishnaram Kenthapadi
Abstract summary: We show that token-based interpretability, while being a convenient first choice given the input interfaces of the ML models, is not the most effective one in all situations. We show that higher-level feature attributions offer several advantages: 1) they are more robust as measured by the randomization tests, 2) they lead to lower variability when using approximation-based methods like SHAP, and 3) they are more intelligible to humans in situations where the linguistic coherence resides at a higher level.
Score: 16.66535643383862
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The large size and complex decision mechanisms of state-of-the-art text classifiers make it difficult for humans to understand their predictions, leading to a potential lack of trust by the users. These issues have led to the adoption of methods like SHAP and Integrated Gradients to explain classification decisions by assigning importance scores to input tokens. However, prior work, using different randomization tests, has shown that interpretations generated by these methods may not be robust. For instance, models making the same predictions on the test set may still lead to different feature importance rankings. In order to address the lack of robustness of token-based interpretability, we explore explanations at higher semantic levels like sentences. We use computational metrics and human subject studies to compare the quality of sentence-based interpretations against token-based ones. Our experiments show that higher-level feature attributions offer several advantages: 1) they are more robust as measured by the randomization tests, 2) they lead to lower variability when using approximation-based methods like SHAP, and 3) they are more intelligible to humans in situations where the linguistic coherence resides at a higher granularity level. Based on these findings, we show that token-based interpretability, while being a convenient first choice given the input interfaces of the ML models, is not the most effective one in all situations.

Related papers

ExaGPT: Example-Based Machine-Generated Text Detection for Human Interpretability [62.285407189502216]
Detecting texts generated by Large Language Models (LLMs) could cause grave mistakes due to incorrect decisions. We introduce ExaGPT, an interpretable detection approach grounded in the human decision-making process. We show that ExaGPT massively outperforms prior powerful detectors by up to +40.9 points of accuracy at a false positive rate of 1%.
arXiv Detail & Related papers (2025-02-17T01:15:07Z)
CUE: An Uncertainty Interpretation Framework for Text Classifiers Built on Pre-Trained Language Models [28.750894873827068]
We propose a novel framework, called CUE, which aims to interpret uncertainties inherent in the predictions of PLM-based models. By comparing the difference in predictive uncertainty between the perturbed and the original text representations, we are able to identify the latent dimensions responsible for uncertainty.
arXiv Detail & Related papers (2023-06-06T11:37:46Z)
Pre-trained Embeddings for Entity Resolution: An Experimental Analysis [Experiment, Analysis & Benchmark] [65.11858854040544]
We perform a thorough experimental analysis of 12 popular language models over 17 established benchmark datasets. First, we assess their vectorization overhead for converting all input entities into dense embeddings vectors. Second, we investigate their blocking performance, performing a detailed scalability analysis, and comparing them with the state-of-the-art deep learning-based blocking method. Third, we conclude with their relative performance for both supervised and unsupervised matching.
arXiv Detail & Related papers (2023-04-24T08:53:54Z)
Language Model Classifier Aligns Better with Physician Word Sensitivity than XGBoost on Readmission Prediction [86.15787587540132]
We introduce sensitivity score, a metric that scrutinizes models' behaviors at the vocabulary level. Our experiments compare the decision-making logic of clinicians and classifiers based on rank correlations of sensitivity scores.
arXiv Detail & Related papers (2022-11-13T23:59:11Z)
Saliency Map Verbalization: Comparing Feature Importance Representations from Model-free and Instruction-based Methods [6.018950511093273]
Saliency maps can explain a neural model's predictions by identifying important input features. We formalize the underexplored task of translating saliency maps into natural language. We compare two novel methods (search-based and instruction-based verbalizations) against conventional feature importance representations.
arXiv Detail & Related papers (2022-10-13T17:48:15Z)
Perturbations and Subpopulations for Testing Robustness in Token-Based Argument Unit Recognition [6.502694770864571]
Argument Unit Recognition and Classification aims at identifying argument units from text and classifying them as pro or against. One of the design choices that need to be made when developing systems for this task is what the unit of classification should be: segments of tokens or full sentences. Previous research suggests that fine-tuning language models on the token-level yields more robust results for classifying sentences compared to training on sentences directly. We reproduce the study that originally made this claim and further investigate what exactly token-based systems learned better compared to sentence-based ones.
arXiv Detail & Related papers (2022-09-29T13:44:28Z)
Interpreting Language Models with Contrastive Explanations [99.7035899290924]
Language models must consider various features to predict a token, such as its part of speech, number, tense, or semantics. Existing explanation methods conflate evidence for all these features into a single explanation, which is less interpretable for human understanding. We show that contrastive explanations are quantifiably better than non-contrastive explanations in verifying major grammatical phenomena.
arXiv Detail & Related papers (2022-02-21T18:32:24Z)
Contextualized Semantic Distance between Highly Overlapped Texts [85.1541170468617]
Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation. This paper aims to address the issue with a mask-and-predict strategy. We take the words in the longest common sequence as neighboring words and use masked language modeling (MLM) to predict the distributions on their positions. Experiments on Semantic Textual Similarity show NDD to be more sensitive to various semantic differences, especially on highly overlapped paired texts.
arXiv Detail & Related papers (2021-10-04T03:59:15Z)
AES Systems Are Both Overstable And Oversensitive: Explaining Why And Proposing Defenses [66.49753193098356]
We investigate the reason behind the surprising adversarial brittleness of scoring models. Our results indicate that autoscoring models, despite getting trained as "end-to-end" models, behave like bag-of-words models. We propose detection-based protection models that can detect oversensitivity and overstability causing samples with high accuracies.
arXiv Detail & Related papers (2021-09-24T03:49:38Z)
On the Lack of Robust Interpretability of Neural Text Classifiers [14.685352584216757]
We assess the robustness of interpretations of neural text classifiers based on pretrained Transformer encoders. Both tests show surprising deviations from expected behavior, raising questions about the extent of insights that practitioners may draw from interpretations.
arXiv Detail & Related papers (2021-06-08T18:31:02Z)
Discriminatory Expressions to Produce Interpretable Models in Short Documents [0.0]
State-of-the-art models are black boxes that should not be used to solve problems that may have a social impact. This paper presents a feature selection mechanism that is able to improve comprehensibility by using less but more meaningful features.
arXiv Detail & Related papers (2020-11-27T19:00:50Z)
Syntactic Structure Distillation Pretraining For Bidirectional Encoders [49.483357228441434]
We introduce a knowledge distillation strategy for injecting syntactic biases into BERT pretraining. We distill the approximate marginal distribution over words in context from the syntactic LM. Our findings demonstrate the benefits of syntactic biases, even in representation learners that exploit large amounts of data.
arXiv Detail & Related papers (2020-05-27T16:44:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.